In learning tasks with label noise, boosting model robustness against overfitting is a pivotal challenge because the model eventually memorizes labels including the noisy ones. Identifying the samples with corrupted labels and preventing the model from learning them is a promising approach to address this challenge. Per-sample training loss is a previously studied metric that considers samples with small loss as clean samples on which the model should be trained. In this work, we first demonstrate the ineffectiveness of this small-loss trick. Then, we propose a novel discriminator metric called confidence error and a sieving strategy called CONFES to effectively differentiate between the clean and noisy samples. We experimentally illustrate the superior performance of our proposed approach compared to recent studies on various settings such as synthetic and real-world label noise. Moreover, we show CONFES can be combined with other approaches such as Co-teaching and DivideMix to further improve the model performance.
翻译:在使用标签噪音的学习任务中,提高模型的稳健性以防止过度装配是一个关键的挑战,因为模型最终会回忆标签,包括噪音标签。识别带有腐蚀标签的样本并防止模型学习这些样本是应对这一挑战的一个很有希望的方法。每个样本的培训损失是以前研究过的一种衡量标准,将小额损失的样本视为该模型应当加以培训的清洁样本。在这项工作中,我们首先展示了这种小损失伎俩的无效性。然后,我们提出了一个称为信心错误的新型歧视衡量标准,以及一个称为CONFES的真知灼见战略,以有效区分清洁和吵闹样品。我们实验性地展示了我们拟议方法的优于诸如合成和真实世界标签噪音等各种环境的最新研究。此外,我们展示了CONFES可以与其他方法相结合,例如共同教学和分解组合,以进一步改进模型性能。