通过损失检查识别对象探测数据集中的标签错误</s> (Identifying Label Errors in Object Detection Datasets by Loss Inspection)

Labeling datasets for supervised object detection is a dull and time-consuming task. Errors can be easily introduced during annotation and overlooked during review, yielding inaccurate benchmarks and performance degradation of deep neural networks trained on noisy labels. In this work, we for the first time introduce a benchmark for label error detection methods on object detection datasets as well as a label error detection method and a number of baselines. We simulate four different types of randomly introduced label errors on train and test sets of well-labeled object detection datasets. For our label error detection method we assume a two-stage object detector to be given and consider the sum of both stages' classification and regression losses. The losses are computed with respect to the predictions and the noisy labels including simulated label errors, aiming at detecting the latter. We compare our method to three baselines: a naive one without deep learning, the object detector's score and the entropy of the classification softmax distribution. We outperform all baselines and demonstrate that among the considered methods, ours is the only one that detects label errors of all four types efficiently. Furthermore, we detect real label errors a) on commonly used test datasets in object detection and b) on a proprietary dataset. In both cases we achieve low false positives rates, i.e., when considering 200 proposals from our method, we detect label errors with a precision for a) of up to 71.5% and for b) with 97%.

翻译：用于受监督物体探测的标签数据集是一项乏味和耗时的任务。在批注过程中很容易引入错误,并在审查过程中忽略错误,从而产生不准确的基准和在噪音标签上受过训练的深神经网络的性能退化。在这项工作中,我们首次在物体探测数据集以及标签错误探测方法和若干基线中引入标签错误探测方法的基准。我们在火车和标签良好的物体探测数据集的测试组中模拟四种随机引入标签错误。对于我们的标签错误探测方法,我们假定给出一个两阶段的物体探测器,并考虑两个阶段的分类和回归损失之和。这些损失是在预测和噪音标签方面计算出来的,包括模拟标签错误,目的是检测后者。我们将我们的方法比作三个基准:一个没有深入学习的天真,天体探测器的分数和分类软体积分布的酶。我们超越了所有基线,并且证明在考虑的方法中,我们假设了所有四个阶段的标签对象的误差是两个阶段的和倒退损失的总和总和总和总和总和准确率。此外,我们用真实的标签方法来检测一个真实的路径,我们用一个测试了真实的数据比率。</s>