The ability to train deep neural networks under label noise is appealing, as imperfectly annotated data are relatively cheaper to obtain. State-of-the-art approaches are based on semi-supervised learning(SSL), which selects small loss examples as clean and then applies SSL techniques for boosted performance. However, the selection step mostly provides a medium-sized and decent-enough clean subset, which overlooks a rich set of clean samples. In this work, we propose a novel noisy label learning framework ProMix that attempts to maximize the utility of clean samples for boosted performance. Key to our method, we propose a matched high-confidence selection technique that selects those examples having high confidence and matched prediction with its given labels. Combining with the small-loss selection, our method is able to achieve a precision of 99.27 and a recall of 98.22 in detecting clean samples on the CIFAR-10N dataset. Based on such a large set of clean data, ProMix improves the best baseline method by +2.67% on CIFAR-10N and +1.61% on CIFAR-100N datasets. The code and data are available at https://github.com/Justherozen/ProMix
翻译:在标签噪音下培训深心神经网络的能力是颇有吸引力的,因为不完善的附加说明的数据相对而言比较便宜,因此,根据标签噪音对深心神经网络进行培训的能力是颇具吸引力的,因为不完善的附加说明的数据是比较便宜的;最先进的方法是以半监督的学习(SSL)为基础,选择作为清洁的小型损失实例,然后运用SSL技术提高性能;然而,选择步骤大多提供中等和体面的清洁子集,忽略了丰富的清洁样品组;在这项工作中,我们提议了一个新颖的吵闹标签学习框架ProMix,以尽量扩大清洁样品对提高性能的效用。我们的方法的关键是,我们建议一种匹配的高自信选择技术,选择那些具有高度信心的范例,并与其给定的标签相匹配。与小额损失选择相结合,我们的方法能够达到99.27的精确度和98.22的回顾CIFAR-10N数据集的清洁样品。在如此庞大的一组清洁数据的基础上,ProMix改进了CFAR-10N和CIFAR-100N数据集+1.61%的最好的基线方法。