I. S. Kipyatkova, “Software for Creation of Sintactico-Statistical Russian Language Model Based on the Text Corpus”, Tr. SPIIRAN, 2013, Issue 24,Pages <nobr>332

Software for Creation of Sintactico-Statistical Russian Language Model Based on the Text Corpus

I. S. Kipyatkova

St. Petersburg Institute for Informatics and Automation of RAS

Abstract: Creation of the language model is one of the stages of training of a continuous speech recognition system. In the paper, the developed software for creation of syntactic-statistical Russian language model based on a text corpus is described. The main stages of the algorithm are preliminary text material processing, creation of statistical n-gram language model, extension of the statistical model by n-grams obtained by syntactical analysis. Syntactical analysis permits to increase the quantity of different bigrams created during text processing and to improve the quality of the language model by extracting grammatically-connected word pairs. The results of the testing of the language models created with the help of the software module are presented.

Keywords: automatic speech recognition, statistical language model, syntactical analysis.

UDC: 004.522

PACS: 43.71.Sy

MSC: 68T50

Received: 01.02.2013