RUS  ENG
Full version
JOURNALS // Proceedings of the Institute for System Programming of the RAS // Archive

Proceedings of ISP RAS, 2025 Volume 37, Issue 6(2), Pages 107–122 (Mi tisp1077)

Using contrastive learning for semantic interpretation of Russian-language tables

K. V. Tobola, N. O. Dorodnykh

Matrosov Institute for System Dynamics and Control Theory of Siberian Branch of Russian Academy of Sciences

Abstract: Tables are widely used to represent and store data, but they are typically not accompanied by explicit semantics necessary for machine interpretation of their contents. Semantic table interpretation is critical for integrating structured data with knowledge graphs, but existing methods struggle with Russian-language tables due to limited labeled data and linguistic specificity. This paper proposes a contrastive learning-based approach to reduce dependency on manual labeling and improve column annotation quality for rare semantic types. The proposed approach adapts contrastive learning for tabular data using augmentations (removing/shuffling cells) and a distilled multilingual DistilBERT model trained on unlabeled RWT corpus (7.4M columns). The learned table representations are integrated into the RuTaBERT pipeline, which reduces computational costs. Experiments show micro-F1 0.974 and macro-F1 0.924, outperforming some baselines. This highlights the approach’s efficiency in handling data sparsity and Russian language features. Results confirm that contrastive learning captures semantic column similarities without explicit supervision, crucial for rare data types.

Keywords: Russian-language tables, tabular data, semantic table interpretation, semantic column annotation, knowledge graphs, self-supervised learning, contrastive learning, table representations

DOI: 10.15514/ISPRAS-2025-37(6)-23



© Steklov Math. Inst. of RAS, 2026