Deep learning has achieved remarkable success in numerous domains with help from large amounts of big data. However, the quality of data labels is a concern because of the lack of high-quality labels in many real-world scenarios. As noisy labels severely degrade the generalization performance of deep neural networks, learning from noisy labels (robust training) is becoming an important task in modern deep learning applications. In this survey, we first describe the problem of learning with label noise from a supervised learning perspective. Next, we provide a comprehensive review of 57 state-of-the-art robust training methods, all of which are categorized into five groups according to their methodological difference, followed by a systematic comparison of six properties used to evaluate their superiority. Subsequently, we perform an in-depth analysis of noise rate estimation and summarize the typically used evaluation methodology, including public noisy datasets and evaluation metrics. Finally, we present several promising research directions that can serve as a guideline for future studies. All the contents will be available at https://github.com/songhwanjun/Awesome-Noisy-Labels.
翻译:在大量大数据的帮助下,深入学习在许多领域取得了显著成功。然而,数据标签的质量是一个令人关切的问题,因为许多现实世界情景中缺乏高质量的标签。吵闹标签严重降低了深神经网络的通用性能,从吵闹标签(robust training)中学习成为现代深思熟虑应用中的一项重要任务。在本次调查中,我们首先从监督学习的角度来描述用标签噪音学习的问题。接下来,我们全面审查了57个最先进的强有力培训方法,所有方法都根据方法差异分为五组,然后系统比较用于评价其优越性的6种特性。随后,我们深入分析噪音率估计并总结通常使用的评价方法,包括公众噪音数据集和评价指标。最后,我们提出了一些有希望的研究方向,可作为未来研究的指导方针。所有内容将在http://github.com/songhwanjun/Awecoma-Noisy-Labels查阅。