Sample selection is an effective strategy to mitigate the effect of label noise in robust learning. Typical strategies commonly apply the small-loss criterion to identify clean samples. However, those samples lying around the decision boundary with large losses usually entangle with noisy examples, which would be discarded with this criterion, leading to the heavy degeneration of the generalization performance. In this paper, we propose a novel selection strategy, \textbf{S}elf-\textbf{F}il\textbf{t}ering (SFT), that utilizes the fluctuation of noisy examples in historical predictions to filter them, which can avoid the selection bias of the small-loss criterion for the boundary examples. Specifically, we introduce a memory bank module that stores the historical predictions of each example and dynamically updates to support the selection for the subsequent learning iteration. Besides, to reduce the accumulated error of the sample selection bias of SFT, we devise a regularization term to penalize the confident output distribution. By increasing the weight of the misclassified categories with this term, the loss function is robust to label noise in mild conditions. We conduct extensive experiments on three benchmarks with variant noise types and achieve the new state-of-the-art. Ablation studies and further analysis verify the virtue of SFT for sample selection in robust learning.
翻译:样本选择是减轻标签噪音在强健学习中的影响的有效战略。 典型的战略通常使用小损失标准来识别干净的样本。 但是, 位于决定边界周围的样本, 通常与大量损失相纠缠, 通常会与吵闹的例子纠缠在一起, 被丢弃, 从而导致总体性表现的大幅变换。 在本文中, 我们提出一种新的选择战略, 即\ textbf{S}elf- textbf{F}F}lookb{F}F}lookb{textf}} ; 利用历史预测中的噪音例子波动来过滤它们, 从而避免为边界示例选择小额损失标准偏差。 具体地说, 我们引入一个记忆库模块, 储存每个范例的历史预测和动态更新, 以支持随后的学习循环性表现。 此外, 为了减少SFT的样本选择偏差的累积性偏差, 我们设计了一个常规化术语, 以惩罚有自信的产出分布。 通过增加错误分类类别重量, 损失功能在温和条件下标记噪音。 我们用三个基准进行广泛的实验, 并进行稳妥的测试, 。 以 稳妥度的校验SFFI 。