Abstract:
The paper examines the limitations of modern data augmentation methods when applied to images captured by unmanned aerial vehicles in scenarios characterized by high object density and small object sizes. A specialized method, Contextual Small-Object Augmentation, is proposed to intelligently place visually enhanced objects into semantically relevant regions of the image while preserving spatial realism. In particular, the study focuses on a data augmentation module that utilizes super-resolution (SR) networks to improve the visual quality of small objects. For this purpose, several state-of-the-art SR neural models — RCAN, Real-ESRGAN, and SwinIR — were selected. Their impact on the accuracy of object detection and classification was evaluated using the SSD MobileNet V2 FPNLite $320\times320$ model trained on various versions of the VisDrone benchmark dataset. The detection results were compared against a baseline model trained on the original dataset following the evaluation protocol of the COCO Evaluation Metrics. The experimental results demonstrate that incorporating high-resolution networks into the augmentation pipeline significantly improves the detection accuracy of small objects while maintaining computational efficiency.