RUS  ENG
Full version
JOURNALS // Vestnik Yuzhno-Ural'skogo Universiteta. Seriya Matematicheskoe Modelirovanie i Programmirovanie // Archive

Vestnik YuUrGU. Ser. Mat. Model. Progr., 2025 Volume 18, Issue 3, Pages 87–95 (Mi vyuru770)

Programming & Computer Software

Using fuzzy string comparison for automated transfer of formatting in poetic works

N. N. Teslyaa, G. N. Belyakb

a St. Petersburg Federal Research Center of the Russian Academy of Sciences, Saint-Petersburg, Russian Federation
b Institute of Russian Literature (Pushkinskij Dom) of the Russian Academy of Sciences, Saint-Petersburg, Russian Federation

Abstract: The creation of the scientific and educational resource « Pushkin Digital» is driven by the necessity of typesetting poetic texts based on layout information from other editions. From one edition to another, texts may vary, and in each case, typesetting is performed anew according to the rules of the specific edition. Manual typesetting demands attentiveness and significant time and effort from a specialist, as it requires comparing several identical texts across multiple editions. The proposed method addresses two tasks.
First, it determines the extent to which the texts differ between editions, enabling an assessment of the number of errors or deliberate transformations of the text, which is a separate subject of study for textual scholars. Second, based on an evaluation of line differences and their fuzzy alignment, the method generates typesetting rules for each line, taking into account the rules applied in earlier editions.
The method was tested on 914 lyrical works by A.S. Pushkin, successfully ensuring the correct and complete transfer of typesetting for 74,55% of the texts. However, for 25,45% of the cases, this proved unfeasible, requiring manual typesetting instead.

Keywords: fuzzy string comparison, Levenshtein distance, formatting, text processing.

UDC: 004.912

MSC: 68T50

Received: 19.12.2024

DOI: 10.14529/mmp250308



© Steklov Math. Inst. of RAS, 2026