Robustness to small image translations is a highly desirable property for object detectors. However, recent works have shown that CNN-based classifiers are not shift invariant. It is unclear to what extent this could impact object detection, mainly because of the architectural differences between the two and the dimensionality of the prediction space of modern detectors. To assess shift equivariance of object detection models end-to-end, in this paper we propose an evaluation metric, built upon a greedy search of the lower and upper bounds of the mean average precision on a shifted image set. Our new metric shows that modern object detection architectures, no matter if one-stage or two-stage, anchor-based or anchor-free, are sensitive to even one pixel shift to the input images. Furthermore, we investigate several possible solutions to this problem, both taken from the literature and newly proposed, quantifying the effectiveness of each one with the suggested metric. Our results indicate that none of these methods can provide full shift equivariance. Measuring and analyzing the extent of shift variance of different models and the contributions of possible factors, is a first step towards being able to devise methods that mitigate or even leverage such variabilities.
翻译:微小图像翻译的威力是物体探测器非常可取的特性。然而,最近的工作表明,有线电视新闻网的分类系统没有变化,因此不清楚这在多大程度上会影响物体探测,这主要是因为现代探测器预测空间的两面和维度之间的结构差异。为了评估物体探测模型端至端的不均匀变化,我们在本文件中提出了一个评价指标,该指标建立在对已移动图像集平均精度的上下界的贪婪搜索之上。我们的新指标显示,现代物体探测结构,无论一阶段或两阶段、锚基或无锚,都对甚至对输入图像的一等转移十分敏感。此外,我们从文献中和新提出的几项解决这一问题的可能办法,用建议的指标量化每个物体的效能。我们的结果表明,这些方法都无法提供完全的不均匀。测量和分析不同模型和可能因素的变化差异的程度,是制定方法以减缓或甚至利用这种变量的方法的第一步。