RUS  ENG
Full version
JOURNALS // Sistemy i Sredstva Informatiki [Systems and Means of Informatics] // Archive

Sistemy i Sredstva Inform., 2025 Volume 35, Issue 1, Pages 111–124 (Mi ssi967)

Integration of a digital dictionary with parallel corpus texts: a new theoretical approach

D. O. Dobrovol'skijabc, I. M. Zatsmanc

a Vinogradov Russian Language Institute of the Russian Academy of Sciences, 18 / 2 Volkhonka Str., Moscow 119019, Russian Federation
b Institute of Linguistics of the Russian Academy of Sciences, 1 bld. 1 Bolshoy Kislovsky Lane, Moscow 125009, Russian Federation
c Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The present paper considers issues of integrating a digital multilingual dictionary (using the example of a German–Russian dictionary) with texts of a parallel corpus within the framework of a lexicographic information system that includes three components: ($i$) a digital multilingual dictionary; ($ii$) a corpus as a repository of parallel texts; and ($iii$) a database of annotated translation correspondences and two knowledge bases. The proposed approach to integration is a synthesis of a number of conceptual procedures, including application of the multilevel structuring principle of dictionary entries, formation of annotated translation correspondences for polysemous words and set phrases along with their translations, and providing links between the digital multilingual dictionary and the repository of parallel texts based on individual meanings of polysemous words and set phrases. Until now, such lexicographic information systems have been developed exclusively for monolingual dictionaries with connecting links by lemmas only. The aim of the paper is to describe the proposed approach to integrating a digital multilingual dictionary with texts of a parallel corpus as a theoretical basis for developing a lexicographic information system.

Keywords: lexicographic information system, parallel texts, digital multilingual dictionary, corpus, database of annotated translation correspondences.

Received: 14.10.2024
Accepted: 15.02.2025

DOI: 10.14357/08696527250106



© Steklov Math. Inst. of RAS, 2026