RUS  ENG
Full version
JOURNALS // Zapiski Nauchnykh Seminarov POMI // Archive

Zap. Nauchn. Sem. POMI, 2025 Volume 546, Pages 32–47 (Mi znsl7628)

Transformer-based approaches for lemmatizing abbreviations in Russian texts

A. Glazkovaa, O. Lyashevskayabc, D. Morozovde, I. Smald

a University of Tyumen
b Vinogradov Russian Language Institute RAS
c HSE University
d Novosibirsk State University
e Russian National Corpus

Abstract: This paper addresses the task of lemmatizing abbreviations in the Russian language. Abbreviation lemmatization is particularly challenging, as it involves not only transforming a word into its normal form but also correctly expanding the abbreviation. We explore two approaches to this task, both leveraging large pre-trained language models. The first approach is generative, where the lemma is produced as a textual output by the model. The second approach relies on classification models to select the most appropriate lemma for abbreviations that have multiple common expansions. The paper discusses the strengths and limitations of both approaches. The experiments are conducted on Russian texts selected from the Russian National Corpus.

Key words and phrases: lemmatization, abbreviations, morphological tagging, Russian language, text classification, generative models.

UDC: 004.912

Received: 28.02.2025

Language: English



© Steklov Math. Inst. of RAS, 2026