Abstract:
The article addresses the task of localizing a Russian Federation citizen's passport in photographs where the document occupies a small portion of the frame. This problem is particularly relevant for remote verification systems that require users to upload a selfie with their passport. The small scale complicates document recognition and localization due to lower resolution. To improve localization accuracy, an ultra-lightweight neural network model, YOLO-Passport, is proposed for passport region localization, reducing the problem to a fixed document scale. Compared to compact YOLO detectors, YOLO-Passport has an order of magnitude fewer operations and parameters. The proposed approach increased the detection recall of Russian passports from 91.6% to 97.4%. The model's inference time on a CPU is 3 ms, and its size in 8-bit format is only 340 KB, making it efficient for deployment in industrial systems and WASM-based web applications.
Keywords:document recognition, deep learning, object detection, YOLO.