Abstract:
This paper examines the Supracorpora Database of Connectives
(SCDB-Connectives) that is based on data from parallel corpora. The
SCDB-Connectives provides structural and semantic annotation of Russian
connectives and their translation correspondences in French (and, eventually, in other
languages). The SCDB-Connectives annotation approach is compared to the latest
developments in the area of annotation of discourse relations — the annotated corpus
of discourse relations Penn Discourse Treebank (PDTB) and the proposed standard
for annotation of semantic relations ISO 24617-8, some of the important differences
are discussed. Penn Discourse Treebank and ISO 24617-8 support annotation of both explicit
and implicit discourse relations
while SCDB-Connectives only annotates explicit relations,
i. e., those expressed by connectives. Furthermore, PDTB and ISO 24617-8 provide
a superior framework for annotating text spans as relation arguments, which allows
annotating attribution for these arguments, such as source and type of the linked
propositions. In addition, ISO 24617-8 specifies argument roles for asymmetrical
discourse relations. On the other hand, the principle advantage of the
SCDB-Connectives is that it supports annotation of both connectives and their translation
correspondences in parallel corpora, opening up new possibilities for contrastive
studies. The SCDB-Connectives is based on a relational database rather than on the
XML format, which helps to manage complex cross-linguistic data efficiently.
Benefits of semantic annotation of connectives for both theoretical and practical
purposes are also discussed.
Keywords:discourse relations; discourse connectives; corpus linguistics; parallel
corpora; supracorpora databases.