RUS  ENG
Full version
JOURNALS // Vestnik of Astrakhan State Technical University. Series: Management, Computer Sciences and Informatics // Archive

Vestn. Astrakhan State Technical Univ. Ser. Management, Computer Sciences and Informatics, 2023 Number 2, Pages 85–92 (Mi vagtu755)

This article is cited in 1 paper

COMPUTER SOFTWARE AND COMPUTING EQUIPMENT

Simulation of process of symbol recognition in regulating documents of organization

T. V. Khomenko, A. A. Irgaliev, V. D. Tarakanov

Astrakhan State Technical University, Astrakhan, Russia

Abstract: Improving the quality of classification of different documents is a purpose of modeling the optical character recognition. Non-digital documents, such as scanned or photographed documents, are difficult to classify correctly in electronic document management systems. A decision was made to simulate the process of optical character recognition in the regulatory documents of the organization. There have been considered various methods of modeling the process. The structure of departments for the electronic document management system is given. Methods of implementing optical character recognition (OCR) are considered. The stages of the OCR system development are revealed: image processing, segmentation, recognition. The methods of image processing are analyzed. The main processes associated with image processing are disclosed: alignment, blurring, binarization, finding contours, removing extra lines. Comparison of image blur methods is made. Two stages of image binarization are defined: conversion of a color image into a gray image, binarization of a gray image. The Kenny operator is proposed as a second stage of binarization, which is used to detect the boundaries of the image. The last stage of image processing is the process of removing extra lines. Algorithms for dividing text areas into segments are considered. 3 stages of segmentation are identified: string segmentation, word segmentation, character segmentation. A segmentation algorithm is defined based on calculating the average brightness of image pixels to search for different intervals: line spacing, word spacing, character spacing. Available popular online OCR services as well as some popular desktop programs are considered. A connection has been found between an artificial neural network and optical object recognition. To implement the recognition stage, it is proposed to use an artificial neural network.

Keywords: image processing, segmentation, character recognition, binarization, blurring, image contours, artificial neural network, document classification, document flow, document familiarization, organization, management.

UDC: 651.4

Received: 30.11.2022
Accepted: 24.04.2023

DOI: 10.24143/2072-9502-2023-2-85-92



© Steklov Math. Inst. of RAS, 2026