Abstract:
A new representation structure of large vocabulary for high inflective language is sketched. Reach morphology complicates text and speech parsing. To improve the performance a two level morpho-phonetic prefix graph is proposed for vocabulary representation. Sharing the identical beginning parts and endings of different words significantly reduces the search space for a large vocabulary. Stem based language model reduces the complexity of continuous speech decoding and solves data scarcity problem for the inflective languages. The proposed graph was compared with two baseline word lattice models that showed significant reduction of topology complexity of the graph.