Object detectors are vital to many modern computer vision applications. However, even state-of-the-art object detectors are not perfect. On two images that look similar to human eyes, the same detector can make different predictions because of small image distortions like camera sensor noise and lighting changes. This problem is called inconsistency. Existing accuracy metrics do not properly account for inconsistency, and similar work in this area only targets improvements on artificial image distortions. Therefore, we propose a method to use non-artificial video frames to measure object detection consistency over time, across frames. Using this method, we show that the consistency of modern object detectors ranges from 83.2% to 97.1% on different video datasets from the Multiple Object Tracking Challenge. We conclude by showing that applying image distortion corrections like .WEBP Image Compression and Unsharp Masking can improve consistency by as much as 5.1%, with no loss in accuracy.
翻译:许多现代计算机视觉应用软件都离不开物体探测器。 但是, 即使最先进的物体探测器也不完美。 在两张看起来与人类眼睛相似的图像上, 同样的探测器可以做出不同的预测, 因为像相机传感器噪音和照明变化这样的小图像扭曲作用。 这个问题被称为不一致性。 现有的精确度指标不能恰当地说明不一致之处, 这方面的类似工作只能针对人工图像扭曲的改进。 因此, 我们提出一种方法, 使用非人工视频框架来测量时间跨框架的物体探测一致性。 使用这种方法, 我们发现, 现代物体探测器在多物体跟踪挑战的不同视频数据集中的一致性从83.2%到97.1%不等。 我们的结论是, 应用图像扭曲校正方法, 如. WEBP 图像压缩和未涂色遮掩等, 可以提高一致性5.1%, 准确性不会受损 。