RUS  ENG
Full version
JOURNALS // Sistemy i Sredstva Informatiki [Systems and Means of Informatics] // Archive

Sistemy i Sredstva Inform., 2021 Volume 31, Issue 3, Pages 101–112 (Mi ssi785)

This article is cited in 11 papers

Conceptual framework for supracorpora databases

M. G. Kruzhkov

Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119133, Russian Federation

Abstract: The paper provides an overview of the concept, main structural constituents, and functions of supracorpora databases (SCDB). Supracorpora databases represent a novel type of structured information resources that significantly expand capabilities of linguistic text corpora, parallel corpora in particular. The paper outlines principle features and limitations of parallel corpora and demonstrates how SCDBs allow extending these features and overcoming the limitations. Supracorpora databases allow linguistic experts to establish, record, and annotate translation correspondences between language units in the source and target texts while relying on faceted classification categories composed by the researchers themselves according to their requirements. The article also describes the general structure of SCDB architecture developed in FRC CSC RAS which incorporates corpus and subcorpus constituents that interact with one another as a part of a common database.

Keywords: corpus linguistics, supracorpora database, parallel corpus, linguistic annotation, information technologies, faceted classification.

Received: 14.08.2021

DOI: 10.14357/08696527210309



© Steklov Math. Inst. of RAS, 2026