Abstract:
This paper investigates the problem of cell recognition in the image of a table using the example of the Russian tax document (2-NDFL). Despite the simple structure of the tables, the printing method is based on a flexible template. The flexibility of the form is observed in the modifications of textual information and in the table area. The flexibility of tables lies in the modification of the number and size of columns. A structural method was proposed for table detection. The input data are the detected horizontal and vertical segments. Segments were searched by the Smart Document Reader system. Implementing and testing the method were also carried out in the Smart Document Reader system. In addition to detecting the area where tables can be placed, the following objectives were achieved: searching for table cells, naming table cells, and validating the table area. Validation of the table area was performed for separate tables and for table sets. The application of table aggregate descriptions showed the high reliability of linking table sets.
Keywords:table recognition, line detection, table layout.