RUS  ENG
Full version
JOURNALS // Proceedings of the Institute for System Programming of the RAS // Archive

Proceedings of ISP RAS, 2025 Volume 37, Issue 6(1), Pages 149–166 (Mi tisp1063)

Using large language models for table header recognition

I. I. Okhotin, N. O. Dorodnykh

Matrosov Institute for System Dynamics and Control Theory of Siberian Branch of Russian Academy of Sciences

Abstract: Automatic table header recognition remains a challenging task due to the diversity of table layouts, including multi-level headers, merged cells, and non-standard formatting. This paper is the first to propose a methodology to evaluate the performance of large language models on this task using prompt engineering. The study covers eight different models and six prompt strategies with zero-shot and few-shot settings, on a dataset of 237 tables. The results demonstrate that model size critically affects the accuracy: large models (405 billion parameters) achieve F1 $\approx$ 0.80–0.85, while small ones (7 billion parameters) show F1 $\approx$ 0.06–0.30. Complicating prompts with step-by-step instructions, search criteria, and examples improves the results only for large models, while for small ones it leads to degradation due to context overload. The largest errors occur when processing tables with hierarchical headers and merged cells, where even large models lose up to accuracy of recognition. The practical significance of this paper lies in identifying optimal configurations of prompts for different types of models. For example, short instructions are effective for large models, and step-by-step instructions with search criteria are effective for medium ones. This study opens up new possibilities for creating universal tools for automatic analysis of table headers.

Keywords: table, table headers, table structure recognition, header recognition, large language model, prompt engineering

DOI: 10.15514/ISPRAS-2025-37(6)-9



© Steklov Math. Inst. of RAS, 2026