The memorization effect of deep neural networks (DNNs) plays a pivotal role in recent label noise learning methods. To exploit this effect, the model prediction-based methods have been widely adopted, which aim to exploit the outputs of DNNs in the early stage of learning to correct noisy labels. However, we observe that the model will make mistakes during label prediction, resulting in unsatisfactory performance. By contrast, the produced features in the early stage of learning show better robustness. Inspired by this observation, in this paper, we propose a novel feature embedding-based method for deep learning with label noise, termed LabEl NoiseDilution (LEND). To be specific, we first compute a similarity matrix based on current embedded features to capture the local structure of training data. Then, the noisy supervision signals carried by mislabeled data are overwhelmed by nearby correctly labeled ones (\textit{i.e.}, label noise dilution), of which the effectiveness is guaranteed by the inherent robustness of feature embedding. Finally, the training data with diluted labels are further used to train a robust classifier. Empirically, we conduct extensive experiments on both synthetic and real-world noisy datasets by comparing our LEND with several representative robust learning approaches. The results verify the effectiveness of our LEND.
翻译:深度神经网络(DNN)的记忆效应在最近的标签噪音学习方法中起着关键作用。为了利用这一效果,模型预测方法已被广泛采用,目的是在学习的早期阶段利用DNN的输出来纠正吵闹的标签。然而,我们观察到,模型在标签预测中会犯错误,导致不令人满意的性能。相比之下,早期学习阶段产生的特征显示出更强的强健性。在本文的观察启发下,我们提出了一种基于新颖特征的嵌入方法,用于用标签噪音(LabEl噪音分解(LEND))进行深层学习。具体地说,我们首先根据当前嵌入的特征计算一个类似矩阵,以捕捉当地培训数据结构。然后,由错误标签数据携带的监听信号会被附近的正确标签信号(\ textit{i.e.},标签噪音稀释性能保证其有效性。最后,用稀释标签的培训数据被进一步用于培训一个强有力的分类师。我们用大量合成数据进行对比,我们模拟性地对真实数据进行广泛的合成研究。