RUS  ENG
Full version
JOURNALS // Proceedings of the Institute for System Programming of the RAS // Archive

Proceedings of ISP RAS, 2023 Volume 35, Issue 5, Pages 193–214 (Mi tisp823)

Named entity recognition for code review comments

V. V. Kachanovab, A. S. Khitrovaca, S. I. Markova

a Ivannikov Institute for System Programming of the RAS
b Moscow Institute of Physics and Technology
c Lomonosov Moscow State University

Abstract: This paper addresses the problem of named entities recognition from source code reviews. The paper provides a comparative analysis of existing approaches and proposes its own methods to improve the quality of problem solving. Proposed and implemented improvements include: methods to deal with data imbalances, improved tokenization of input data, the use of large arrays of unlabeled data, and the use of additional binary classifiers. To assess quality, a new set of 3,000 user code reviews was collected and manually labeled. It is shown that the proposed improvements can significantly increase the performance measured by quality metrics, calculated both at the token level (+22%) and at the entire entity level (+13%).

Keywords: machine learning, named entity recognition, dataset

DOI: 10.15514/ISPRAS-2023-35(5)-13



© Steklov Math. Inst. of RAS, 2026