RUS  ENG
Full version
JOURNALS // Problemy Peredachi Informatsii // Archive

Probl. Peredachi Inf., 2017 Volume 53, Issue 3, Pages 100–111 (Mi ppi2248)

This article is cited in 8 papers

Source Coding

Information-theoretic method for classification of texts

B. Ya. Ryabkoab, A. E. Gus'kovca, I. V. Selivanovabc

a Institute of Computational Technologies, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
b Novosibirsk State University, Novosibirsk, Russia
c Russian National Public Library for Science and Technnology, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia

Abstract: We consider a method for automatic (i.e., unmanned) text classification based on methods of universal source coding (or “data compression”). We show that under certain restrictions the proposed method is consistent, i.e., the classification error tends to zero with increasing text lengths. As an example of practical use of the method we consider the classification problem for scientific texts (research papers, books, etc.). The proposed method is experimentally shown to be highly efficient.

UDC: 621.391.1+519.72

Received: 21.10.2015
Revised: 13.05.2017


 English version:
Problems of Information Transmission, 2017, 53:3, 294–304

Bibliographic databases:


© Steklov Math. Inst. of RAS, 2026