Abstract:
The task of per-frame combination of text recognition results from multiple images is an important component of video stream document recognition systems. Currently there is no unified approach to solving this problem which would yield a high precision of text recognition. In this paper a comparative study is presented of known approaches to the combination of recognition results for identity document fields. It was demonstrated that different approaches are advantageous on different parts of the data sets, while a sepection of the potential best single result can still significantly outperform all the analyzed methods.