RUS  ENG
Full version
JOURNALS // Sistemy i Sredstva Informatiki [Systems and Means of Informatics] // Archive

Sistemy i Sredstva Inform., 2018 Volume 28, Issue 4, Pages 168–181 (Mi ssi616)

This article is cited in 3 papers

Method for description of multiword connectives in Supracorpora databases

O. Yu. Inkova, M. G. Kruzhkov

Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: This article presents a new method for describing the structure of multiword connectives implemented in the Supracorpora database (SCDB) of connectives. Currently, the structure of connectives is underinvestigated, and criteria for determining boundaries of connectives and their components are lacking. The proposed method is based on the cognitive-semantic approach that considers multiword connectives as more or less free word combinations generated in the process of speech. A two-tier faceted classification is proposed which allows annotating, on one hand, specific tokens of connectives in texts (context annotation) and, on the other hand, the inner structure of connectives (structural annotation). The structural annotation is based on two aspects: structural type and structural components of connectives. Based on the proposed annotation method, a system of cross-clusters is implemented that extends the search and statistical capabilities of SCDB. In addition, this method allows researchers to eliminate subjectivity during the annotation process and to fill some gaps in linguistic knowledge, for example, to gather new data on combinatorial capabilities of Russian connectives.

Keywords: connectives, linguistic items structure, linguistic items variation, corpus linguistics, annotation, faceted classification, supracorpora databases.

Received: 05.09.2018

DOI: 10.14357/08696527180416



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2026