I. S. Kipyatkova, A. A. Karpov, “Development and Research of a Statistical Russian Language Model”, Tr. SPIIRAN, 2010, Issue 12,Pages <nobr>35

This article is cited in 1 paper

Development and Research of a Statistical Russian Language Model

I. S. Kipyatkova, A. A. Karpov

St. Petersburg Institute for Informatics and Automation of RAS

Abstract: In the paper, the process of creation of a statistical Russian language model for con-tinuous speech recognition systems is described. Characteristics of the collected corpus that consists of several news Internet sites of some on-line newspapers is given; a statistical analysis of this corpus is carried out. Unigram, bigram, and trigram Russian language models have been created on the base of the collected text corpus. For an estimation of quality of these models the entropy and perplexity parameters for these models have been computed. Also a survey of existing approaches for creation of statistical language models is given in the paper.

Keywords: statistical text processing, language model.

UDC: 004.522

Received: 16.11.2010
Accepted: 06.12.2010