Abstract:
A problem of symbols identification of natural language texts on numerical charac-teristics of these texts is considered. The proposed solution for the Russian texts is based on the language rules and bigram frequency. The solution is a system of identifying functions for each character of the alphabet and a deterministic sequence of their application. The limitations, efficiency and extension options of the proposed solution are shown.
Keywords:identification; character; bigram; the Russian language; one-to-one substitution.