The quality of training datasets for deep neural networks is a key factor contributing to the accuracy of resulting models. This effect is amplified in difficult tasks such as object detection. Dealing with errors in datasets is often limited to accepting that some fraction of examples is incorrect, estimating their confidence and assigning appropriate weights or ignoring uncertain ones during training. In this work, we propose a different approach. We introduce the Confident Learning for Object Detection (CLOD) algorithm for assessing the quality of each label in object detection datasets, identifying missing, spurious, mislabeled and mislocated bounding boxes and suggesting corrections. By focusing on finding incorrect examples in the training datasets, we can eliminate them at the root. Suspicious bounding boxes can be reviewed in order to improve the quality of the dataset, leading to better models without further complicating their already complex architectures. The proposed method is able to point out 99% of artificially disturbed bounding boxes with a false positive rate below 0.3. We see this method as a promising path to correcting popular object detection datasets.
翻译:暂无翻译