RUS  ENG
Full version
JOURNALS // Computational nanotechnology // Archive

Comp. nanotechnol., 2025 Volume 12, Issue 4, Pages 13–19 (Mi cn588)

MATHEMATICAL MODELING, NUMERICAL METHODS AND COMPLEX PROGRAMS

Triplet-based knowledge mining using pretrained large language models

B. R. Zinnurov, Z.M. Gizatullin

Kazan National Research Technical University named after A.N. Tupolev – KAI

Abstract: Extracting structured information from text is a key task in natural language processing. Large language models for information extraction tasks achieve high accuracy thanks to pre-training on huge volumes of data. However, such models require significant computational resources and are unavailable for local use due to their dependence on cloud infrastructure. Therefore, compact, open-source large language models that can be retrained locally are increasingly being used to address this problem. This paper evaluates the effectiveness of retraining compact large language models for automated triplet information extraction from unstructured text. The Mistral model with seven billion parameters was used in the study. The model was fine-tuned on a custom dataset consisting of 650 examples, each containing an instruction, an input text and an expected output. The results confirm the effectiveness of retraining: the F1-score increased several-fold compared to the baseline model. The retrained version of the model demonstrates competitiveness with the large-scale DeepSeek language model with 685 billion parameters. The obtained results highlight the potential of compact open large language models for knowledge extraction tasks under resource constraints, such as knowledge graph construction.

Keywords: large language model, retraining, instruction tuning, triplet extraction, knowledge graph.

UDC: 303.732;004.94;004.8

DOI: 10.33693/2313-223X-2025-12-4-13-19



© Steklov Math. Inst. of RAS, 2026