RUS  ENG
Full version
JOURNALS // Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia // Archive

Dokl. RAN. Math. Inf. Proc. Upr., 2025 Volume 527, Pages 146–155 (Mi danma674)

SPECIAL ISSUE: ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING TECHNOLOGIES

Discriminative lemmatization of abbreviations in the era of LLMS

A. V. Glazkovaab, I. A. Smalc, O. N. Lyashevskayade, D. A. Morozovbc

a Tyumen State University, Tyumen, Russia
b Russian National Corpus, Moscow, Russia
c Novosibirsk State University, Novosibirsk, Russia
d National Research University Higher School of Economics, Moscow
e V. V. Vinogradov Russian Language Institute of the Russian Academy of Sciences

Abstract: This paper presents a study on the effectiveness of discriminative methods for abbreviation lemmatization in Russian texts. Unlike generative approaches, discriminative models select the optimal lemma from a fixed set of candidates, eliminating the risk of generating grammatically incorrect word forms. For the first time in Russian language processing, we conduct a comprehensive analysis of four context-aware approaches: (1) masked language model ranking, (2) binary classification, (3) multi-class classification, and (4) prompt-based learning. Special attention is given to cases of contextual ambiguity, where the same abbreviation within a single text fragment corresponds to different lemmas. The results demonstrate that fine-tuned multi-class classification achieves the highest quality. However, with limited training data, both prompt-based learning and masked language model ranking show promising results. Moreover, the effectiveness of these approaches increases in cases of contextual ambiguity. The study contributes to the development of Russian text processing methods by providing practical recommendations for selecting architectures for abbreviation lemmatization tasks.

Keywords: lemmatization, abbreviations, russian language, discriminative methods, text classification, natural language processing.

UDC: 004.8

Received: 21.08.2025
Accepted: 22.09.2025

DOI: 10.7868/S2686954325070124



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2026