Many state-of-the-art noisy-label learning methods rely on learning mechanisms that estimate the samples' clean labels during training and discard their original noisy labels. However, this approach prevents the learning of the relationship between images, noisy labels and clean labels, which has been shown to be useful when dealing with instance-dependent label noise problems. Furthermore, methods that do aim to learn this relationship require cleanly annotated subsets of data, as well as distillation or multi-faceted models for training. In this paper, we propose a new training algorithm that relies on a simple model to learn the relationship between clean and noisy labels without the need for a cleanly labelled subset of data. Our algorithm follows a 3-stage process, namely: 1) self-supervised pre-training followed by an early-stopping training of the classifier to confidently predict clean labels for a subset of the training set; 2) use the clean set from stage (1) to bootstrap the relationship between images, noisy labels and clean labels, which we exploit for effective relabelling of the remaining training set using semi-supervised learning; and 3) supervised training of the classifier with all relabelled samples from stage (2). By learning this relationship, we achieve state-of-the-art performance in asymmetric and instance-dependent label noise problems.
翻译:许多最先进的噪音标签学习方法依靠学习机制来估计样本在培训期间的清洁标签,并丢弃其原有的噪音标签。然而,这一方法阻碍了对图像、噪音标签和清洁标签之间关系的学习,这在处理以实例为基础的标签噪音问题时证明是有用的。此外,为了解这种关系而采用的方法需要干净的附加说明的数据子集,以及蒸馏或多面培训模式。在本文中,我们建议采用新的培训算法,依靠简单的模型来学习清洁标签和噪音标签之间的关系,而不需要经过清洁标签的一组数据。我们的算法遵循一个三阶段的过程,即:(1) 自我监督的预先培训,随后对分类者进行早期停止培训,以便有信心地预测一组培训的清洁标签;(2) 利用从舞台上的清洁数据集(1) 将图像、噪音标签和清洁标签之间的关系拉紧。 我们利用这些算法有效地重新标定其余的培训,使用半监视性学习的混杂标签;以及(3) 由我们从这一阶段开始,监督地进行升级的样品的升级,并监督地进行升级。