V. A. Shulginov, K. S. Klokova, T. A. Yudina, T. M. Obukhova, M. Yu. Lebedeva, “Evaluating the effectiveness of large language models in identifying communicatively significant errors in written works of students learning russian as a foreign language”, Dokl. RAN. Math. Inf. Proc. Upr., 2025, Volume 527,Pages <nobr>94

SPECIAL ISSUE: ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING TECHNOLOGIES

Evaluating the effectiveness of large language models in identifying communicatively significant errors in written works of students learning russian as a foreign language

V. A. Shulginov^a, K. S. Klokova^a, T. A. Yudina^a, T. M. Obukhova^bc, M. Yu. Lebedeva^b

^a Moscow Institute of Physics and Technology, Moscow, Russia
^b Pushkin State Russian Language Institute, Moscow, Russia
^c Lomonosov Moscow State University

Abstract: This article examines the capability of contemporary large language models (LLMs), such as GPT-5 and DeepSeek-R1, to identify and classify communicative errors in written works of students learning Russian as a foreign language (RFL). While existing tools primarily focus on formal errors, this study emphasizes the communicative aspect, evaluating the extent to which an error disrupts comprehension (communicatively significant errors) or merely affects linguistic norms (communicatively insignificant errors). To this end, a corpus of written works by B2-level students (TORFL-2) was created and annotated by experts, and a multi-stage pipeline for testing LLMs was developed, incorporating structured prompting and heuristic voting methods to enhance result reliability. The experiment revealed that while models can localize errors with certain accuracy, they experience considerable difficulties in their proper communicative classification. The models tend to systematically underestimate the degree of error impact on comprehension, confuse error types, and encounter challenges in identifying multiple errors within a single fragment. The study demonstrates both the potential and current limitations of LLMs as tools for automated communicatively-oriented feedback in educational technologies.

Keywords: LLM, natural language processing, RFL, TORFL, communicatively significant errors, communicatively insignificant errors, automated text assessment, error classification.

UDC: 004.9

Received: 20.08.2025
Accepted: 15.09.2025

DOI: 10.7868/S2686954325070082