Abstract:
In the paper, the process of creation of a statistical Russian language model for con-tinuous speech recognition systems is described. Characteristics of the collected corpus that consists of several news Internet sites of some on-line newspapers is given; a statistical analysis of this corpus is carried out. Unigram, bigram, and trigram Russian language models have been created on the base of the collected text corpus. For an estimation of quality of these models the entropy and perplexity parameters for these models have been computed. Also a survey of existing approaches for creation of statistical language models is given in the paper.
Keywords:statistical text processing, language model.