A. D. Sosnovikov, A. D. Zemerov, D. Yu. Turdakov, “Iterative weak supervision with LLM-guided labeling function refinement”, Proceedings of ISP RAS, 2025, Volume 37, Issue 6(2),Pages <nobr>65

Iterative weak supervision with LLM-guided labeling function refinement

A. D. Sosnovikov^ab, A. D. Zemerov^b, D. Yu. Turdakov^a

^a Ivannikov Institute for System Programming of the RAS
^b Tochka Bank

Abstract: Training high-quality classifiers in domains with limited labeled data remains a fundamental challenge in machine learning. While large language models (LLMs) have demonstrated strong zero-shot capabilities, their use as direct predictors suffers from high inference cost, prompt sensitivity, and limited interpretability. Weak supervision, in contrast, provides a scalable alternative through the aggregation of noisy labeling functions (LFs), but authoring and refining these rules traditionally requires significant manual effort. We introduce LLM-Guided Iterative Weak Labeling (LGIWL), a novel framework that integrates prompting with weak supervision in an iterative feedback loop. Rather than using an LLM for classification, we use it to synthesize and refine labeling functions based on downstream classifier errors. The generated rules are filtered using a small development set and applied to unlabeled data via a generative label model, enabling high-quality training of discriminative classifiers with minimal human annotation. We evaluate LGIWL on a real-world text classification task involving Russian-language customer service dialogues. Our method significantly outperforms keyword-based Snorkel heuristics, zero-shot prompting with GPT-4, and even a supervised CatBoost classifier trained on a full labeled dev set. In particular, LGIWL achieves strong recall while yielding a notable improvement in precision, resulting in a final F1 score of 0.863 with a RuModernBERT classifier–demonstrating both robustness and practical scalability.

Keywords: weak supervison, financial sector, LLM

DOI: 10.15514/ISPRAS-2025-37(6)-20