Abstract:
The paper is devoted to the problem of clustering financial message texts by machine learning algorithms. Clustering algorithms can be used to identify groups of similar financial messages, identify the same type or suspicious ones, and use the clusters found rather than the message texts themselves in further analysis. Clustering algorithms such as K-means, DBSCAN and the Hierarchical Clustering method are used in the work. Information about bank transactions is used as texts of financial messages in the work. Due to the fact that bank transactions are subject to strict accounting rules established by the Bank of Russia, it is possible to introduce a metric for assessing the quality of clusterization. This metric allows you to rank the quality of clustering using machine learning algorithms, as well as select the parameters used in training these models. Special attention in the article is paid to the specifics of the data used, and how these features can be taken into account in the practical part. In the practical part of the paper, the results of using clustering models are presented, indicating the optimal parameters of these algorithms. In conclusion, it is concluded that the best clustering algorithms are applied to financial texts.
Keywords:K-means, DBSCAN, Hierarchical clustering method, clustering of financial messages.
UDC:519.8 BBK:
22.18
Received: February 14, 2025 Published: July 31, 2025