Collecting large-scale datasets is crucial for training deep models, annotating the data, however, inevitably yields noisy labels, which poses challenges to deep learning algorithms. Previous efforts tend to mitigate this problem via identifying and removing noisy samples or correcting their labels according to the statistical properties (e.g., loss values) among training samples. In this paper, we aim to tackle this problem from a new perspective, delving into the deep feature maps, we empirically find that models trained with clean and mislabeled samples manifest distinguishable activation feature distributions. From this observation, a novel robust training approach termed adversarial noisy masking is proposed. The idea is to regularize deep features with a label quality guided masking scheme, which adaptively modulates the input data and label simultaneously, preventing the model to overfit noisy samples. Further, an auxiliary task is designed to reconstruct input data, it naturally provides noise-free self-supervised signals to reinforce the generalization ability of deep models. The proposed method is simple and flexible, it is tested on both synthetic and real-world noisy datasets, where significant improvements are achieved over previous state-of-the-art methods.
翻译:收集大型数据集对于培训深层模型至关重要,但指出数据必然会产生吵闹的标签,这对深层学习算法构成挑战。以前的努力倾向于通过在培训样本中根据统计属性(例如损失值)识别和删除吵闹的样本或纠正其标签来缓解这一问题。在本文中,我们的目标是从新的角度解决这一问题,深入深层地貌地图,我们从经验中发现,经过清洁和标签错误的样本培训的模型显示可辨别的活化特征分布。从这一观察中,提出了一种称为对抗性吵闹掩罩的新颖的强健培训方法。其想法是,用标签质量指导掩码方案规范深层特征,同时适应性调整输入数据和标签,防止模型过热样本的过度。此外,我们设计了一项辅助任务,以重建输入数据,自然提供无噪音的自我监督信号,以加强深层模型的通用能力。拟议方法简单而灵活,在合成和现实世界的噪音数据集成上都进行了测试,在先前的状态下取得了重大改进。