The quality of training datasets for deep neural networks is a key factor contributing to the accuracy of resulting models. This is even more important in difficult tasks such as object detection. Dealing with errors in these datasets was in the past limited to accepting that some fraction of examples is incorrect or predicting their confidence and assigning appropriate weights during training. In this work, we propose a different approach. For the first time, we extended the confident learning algorithm to the object detection task. By focusing on finding incorrect labels in the original training datasets, we can eliminate erroneous examples in their root. Suspicious bounding boxes can be re-annotated in order to improve the quality of the dataset itself, thus leading to better models without complicating their already complex architectures. We can effectively point out 99\% of artificially disturbed bounding boxes with FPR below 0.3. We see this method as a promising path to correcting well-known object detection datasets.
翻译:深神经网络培训数据集的质量是促成生成模型准确性的关键因素。 这在物体探测等艰巨任务中更为重要。 处理这些数据集中的错误过去仅限于承认某些例子不正确, 或者在培训期间预测其信任度和分配适当加权数。 在这项工作中,我们提出了不同的方法。 我们第一次将自信学习算法扩大到目标探测任务。 通过在原始训练数据集中寻找错误标签, 我们可以消除其根部错误的例子。 可疑的捆绑框可以重新加注, 以提高数据集本身的质量, 从而导致更好的模型, 而不会使其本已复杂的结构复杂化。 我们可以有效地指出99 ⁇ 人为受干扰的捆绑框, 其FPR值低于0.3。 我们认为这种方法是纠正广为人知的物体探测数据集的一条很有希望的道路。