RUS  ENG
Full version
JOURNALS // Intelligent systems. Theory and applications // Archive

Intelligent systems. Theory and applications, 2020 Volume 24, Issue 1, Pages 55–69 (Mi ista256)

Part 1. General problems of the intellectual systems theory

Completeness, stability and interpretability of probabilistic topic models

A. V. Sukhareva

Lomonosov Moscow State University

Abstract: Interpretability of the solution, the possibility of unsupervised learning, scalability made topic modeling one of the most popular tools for statistical text analysis. Topic models make it possible to reduce the dimension of the data space, since they describe each document as a probabilistic mixture of abstract topics, each topic as distribution over the vocabulary words of a collection. The transition from the space of words into the space of topics leads to a natural solution of the problems of synonymy and polysemy of terms. However, there are a number of disadvantages caused by the dependence of the solution on the initialization. The instability of topic models is a well-known fact, but the problem of completeness related to it is still not studied in the literature. To solve this problem, the article explores a new algorithm for finding a complete set of topics based on the building of the convex hull. Experimentally confirmed the effectiveness of this algorithm. In practice, a complete set of topics was used as the initialization of the ARTM (additive regularization for topic modeling) model. Compared with the randomized initial approximation, the basis topics allows to increase stability, perplexity by more than 10%, coherence by several times.

Keywords: LDA, ARTM, BigARTM, probabilistic topic modeling, stability of topic models, complete set of topics of topic models, latent Dirichlet allocation, LDA, regularization, ARTM, BigARTM.



© Steklov Math. Inst. of RAS, 2026