RUS  ENG
Full version
JOURNALS // Proceedings of the Institute for System Programming of the RAS // Archive

Proceedings of ISP RAS, 2025 Volume 37, Issue 5, Pages 123–130 (Mi tisp1046)

LLM-based interactive code generation: empirical evaluation

D. S. Shaikhelislamovab, M. D. Drobyshevskiyab, A. A. Belevancevcb

a Moscow Institute of Physics and Technology
b Ivannikov Institute for System Programming of the RAS
c Lomonosov Moscow State University

Abstract: Recently, large language models (LLMs), those pretrained on code, have demonstrated strong capabilities in generating programs from informal natural language intent. However, LLM-generated code is prone to bugs. Developers interacting with LLMs seek trusted code and, ideally, clear indications of potential bugs and vulnerabilities. Verified code can mitigate potential business risks associated with adopting generated code. We use model-agnostic framework CodePatchLLM, an extension for LLM that utilizes Svace feedback to enhance code generation quality. We evaluate CodePatchLLM on four popular LLMs across three datasets. Our experiments show an average absolute reduction of 19.1% in static analyzer warnings for Java across all datasets and models, while preserving pass@1 code generation accuracy.

Keywords: large language model; code verification; trusted code.

DOI: 10.15514/ISPRAS-2025-37(5)-9



© Steklov Math. Inst. of RAS, 2026