Deep neural networks (DNNs) have achieved remarkable success in a variety of computer vision tasks, where massive labeled images are routinely required for model optimization. Yet, the data collected from the open world are unavoidably polluted by noise, which may significantly undermine the efficacy of the learned models. Various attempts have been made to reliably train DNNs under data noise, but they separately account for either the noise existing in the labels or that existing in the images. A naive combination of the two lines of works would suffer from the limitations in both sides, and miss the opportunities to handle the two kinds of noise in parallel. This work provides a first, unified framework for reliable learning under the joint (image, label)-noise. Technically, we develop a confidence-based sample filter to progressively filter out noisy data without the need of pre-specifying noise ratio. Then, we penalize the model uncertainty of the detected noisy data instead of letting the model continue over-fitting the misleading information in them. Experimental results on various challenging synthetic and real-world noisy datasets verify that the proposed method can outperform competing baselines in the aspect of classification performance.
翻译:深神经网络(DNNs)在各种计算机视觉任务中取得了显著的成功,在这种任务中,通常需要大量贴标签的图像才能优化模型。然而,从开放世界收集的数据不可避免地受到噪音的污染,这可能会大大削弱所学模型的功效。已经作出各种努力,在数据噪音下可靠地培训DNS,但是它们单独地说明了标签中存在的噪音或图像中存在的噪音。两行工程的天真结合将因双方的局限性而受到影响,并失去了同时处理两种噪音的机会。这项工作为在联合(图像、标签)噪音下可靠学习提供了第一个统一框架。技术上,我们开发了一个基于信任的抽样过滤器,以逐步过滤噪音数据,而不需要预先预测噪音比率。然后,我们惩罚所检测到的噪音数据的模式不确定性,而不是让模型继续过度适应其中的误导信息。关于各种具有挑战性的合成和现实世界噪音数据集的实验结果可以核实,拟议的方法在分类工作中可以超越相互竞争的基准。