Acquiring accurate labels on large-scale datasets is both time consuming and expensive. To reduce the dependency of deep learning models on learning from clean labeled data, several recent research efforts are focused on learning with noisy labels. These methods typically fall into three design categories to learn a noise robust model: sample selection approaches, noise robust loss functions, or label correction methods. In this paper, we propose PARS: Pseudo-Label Aware Robust Sample Selection, a hybrid approach that combines the best from all three worlds in a joint-training framework to achieve robustness to noisy labels. Specifically, PARS exploits all training samples using both the raw/noisy labels and estimated/refurbished pseudo-labels via self-training, divides samples into an ambiguous and a noisy subset via loss analysis, and designs label-dependent noise-aware loss functions for both sets of filtered labels. Results show that PARS significantly outperforms the state of the art on extensive studies on the noisy CIFAR-10 and CIFAR-100 datasets, particularly on challenging high-noise and low-resource settings. In particular, PARS achieved an absolute 12% improvement in test accuracy on the CIFAR-100 dataset with 90% symmetric label noise, and an absolute 27% improvement in test accuracy when only 1/5 of the noisy labels are available during training as an additional restriction. On a real-world noisy dataset, Clothing1M, PARS achieves competitive results to the state of the art.
翻译:获取大型数据集的准确标签既耗时又费钱。为了减少深层次学习模式对从清洁标签标签数据学习的依赖性,最近几项研究努力的重点是以噪音标签学习。这些方法通常分为三个设计类别,以学习噪音稳健模型:抽样选择方法、噪音稳健损失功能或标签校正方法。在本文中,我们提议PARS:Pseudo-Label认识机械抽样选择,一种混合方法,将所有三个世界的最佳指标结合到一个联合培训框架中,以实现对噪音标签的稳健性。具体地说,PARS利用所有培训样品,既使用原始/噪音标签,又使用估计/更新的假标签,以了解噪音稳健型模型。具体地说,样本分为一个模糊的、通过损失分析而模糊的一组,为两组过滤的标签设计基于标签的噪音警报损失功能。结果显示,PARS大大偏离了对密集的CFAR-10和CIFAR-100数据集进行的广泛研究的状态,特别是挑战性高营养和低资源5级标签的精确度,在1号的精确度上,在1号标定标定的精确度上,在1号1号的精确度上实现了测试中,在12号的精确度上实现了一个测试数据中,在12号的精确度的精确度的精确度上实现了数据改进。