Abstract:
This paper addresses a problem of false positive detection filtering in surveillance video streams. We propose two methods. The first one is based on automatic hard negative mining from a video stream, which is then used for fine-tuning of the baseline detector. The second one is the detector output filtering by analyzing the frequency of detection of visually similar samples. We demonstrate the proposed methods on cascade-based detectors, but they can be applied to any detector that can be trained in a reasonable amount of time. Experimental results show that the proposed methods improve both the precision and recall rate, as well as reducing the computational time by 47%.