《在有噪音标签时深学习的图像分类:调查》 (Image Classification with Deep Learning in the Presence of Noisy Labels: A Survey)

Image classification systems recently made a giant leap with the advancement of deep neural networks. However, these systems require an excessive amount of labeled data to be adequately trained. Gathering a correctly annotated dataset is not always feasible due to several factors, such as the expensiveness of the labeling process or difficulty of correctly classifying data, even for the experts. Because of these practical challenges, label noise is a common problem in real-world datasets, and numerous methods to train deep neural networks with label noise are proposed in the literature. Although deep neural networks are known to be relatively robust to label noise, their tendency to overfit data makes them vulnerable to memorizing even random noise. Therefore, it is crucial to consider the existence of label noise and develop counter algorithms to fade away its adverse effects to train deep neural networks efficiently. Even though an extensive survey of machine learning techniques under label noise exists, the literature lacks a comprehensive survey of methodologies centered explicitly around deep learning in the presence of noisy labels. This paper aims to present these algorithms while categorizing them into one of the two subgroups: noise model based and noise model free methods. Algorithms in the first group aim to estimate the noise structure and use this information to avoid the adverse effects of noisy labels. Differently, methods in the second group try to come up with inherently noise robust algorithms by using approaches like robust losses, regularizers or other learning paradigms.

翻译：最近,随着深层神经网络的进步,标签图像分类系统最近迈出了一大步。然而,这些系统需要过多的标签数据才能得到充分培训。收集一个正确的附加说明的数据集并非始终可行,因为有几个因素,例如标签过程费用昂贵,或者很难对数据进行正确分类,甚至对专家来说也是如此。由于这些实际挑战,标签噪音是现实世界数据集的一个常见问题,文献中也提议采用许多方法来训练带有标签噪音的深层神经网络。虽然深层神经网络已知对标签噪音而言相对可靠,但是它们过于完善数据的倾向使得它们容易被记忆化,甚至随机噪音。因此,至关重要的是,要考虑标签噪音的存在,并开发反算法来消除其不利影响,以便有效地训练深层神经网络。尽管在标签噪音下对机器学习技术进行了广泛的调查,但文献缺乏全面的调查方法,这些方法明确围绕在噪音标签上进行的深层学习。本文的目的是将这些算法进行介绍,同时将其分为两个分组之一:噪音模型和噪音模型的内置之本性波动影响。因此,必须考虑是否有标签的常规方法,要先研究其他结构,然后采用静压方法,然后才进行。