RUS  ENG
Full version
JOURNALS // Informatics and Automation // Archive

Informatics and Automation, 2025 Issue 24, volume 6, Pages 1623–1648 (Mi trspy1401)

Artificial Intelligence, Knowledge and Data Engineering

The method for integrating large language models into algorithms for focused monitoring of open social media data

A. Fedorovab, I. Datyevb, I. Vishnyakovb

a Apatity branch of MAU
b IIMM KSC RAS

Abstract: The relevance of the study is determined by the importance and complexity of performing rapid summarization of a vast array of user-generated content on social networks. It is proposed to reduce the complexity of the problem by using robotic algorithms and their automated intelligent focusing on specific platforms, data availability, and data volumes. The paper examines the ability of large language models (LLMs) to generate high-quality, coherent, and context-sensitive annotations (summaries) that are suitable for the dynamic nature of unstructured, noisy social network data. The features of the RAG LLM technology for summarizing social network publications are presented. The main disadvantage of language models is instability and the difficulty of tracking the results to confirm factual accuracy. The authors propose a hybrid method for summarizing social media posts over a given period of time. The method involves a complex and variable combination of classical methods for extracting data from their repositories, as well as the abstractive and generative capabilities of large language models. Large language models are used to vectorize the analyzed data. The application of clustering algorithms to the obtained vector representations made it possible to increase the stability and quality of the results. Within the RAG technology, the capabilities of large language models are expanded by means of intelligent search in the MongoDB database used to store the original data. The paper presents three pipelines, each of which is a variant of the method implementation, and has advantages and disadvantages in various application conditions. The metrics used to evaluate the pipelines are given, and a comparative analysis is performed. Overall, the method allows us to reduce the confabulations of a large language model and obtain annotations of publications for different time periods in real-time. The proposed method is used in practice in the open social media data monitoring system developed by the authors.

Keywords: social media, posts, text summarization, LLMs, RAG, AI agents, hybrid method.

UDC: 004.8

Received: 20.08.2025

DOI: 10.15622/ia.24.6.4



© Steklov Math. Inst. of RAS, 2026