Deep models trained with noisy labels are prone to over-fitting and struggle in generalization. Most existing solutions are based on an ideal assumption that the label noise is class-conditional, i.e., instances of the same class share the same noise model, and are independent of features. While in practice, the real-world noise patterns are usually more fine-grained as instance-dependent ones, which poses a big challenge, especially in the presence of inter-class imbalance. In this paper, we propose a two-stage clean samples identification method to address the aforementioned challenge. First, we employ a class-level feature clustering procedure for the early identification of clean samples that are near the class-wise prediction centers. Notably, we address the class imbalance problem by aggregating rare classes according to their prediction entropy. Second, for the remaining clean samples that are close to the ground truth class boundary (usually mixed with the samples with instance-dependent noises), we propose a novel consistency-based classification method that identifies them using the consistency of two classifier heads: the higher the consistency, the larger the probability that a sample is clean. Extensive experiments on several challenging benchmarks demonstrate the superior performance of our method against the state-of-the-art.
翻译:使用噪音标签训练的深层模型容易过度装配,并普遍地挣扎。大多数现有解决方案都基于标签噪音是等级条件的的理想假设,即同一阶级的事例与噪音模型相同,而且与特征无关。虽然在实践中,真实世界噪音模式通常比依赖环境的模型更精细,这构成了巨大的挑战,特别是在出现阶级间不平衡的情况下。我们在本文件中提议了一种两阶段干净的样本识别方法,以应对上述挑战。首先,我们采用一个等级特征集成程序,及早确定靠近等级预测中心的清洁样品。值得注意的是,我们根据预测将稀有的分类汇总,从而解决阶级不平衡问题。第二,对于与地面真理等级边界相近的其余清洁样品(通常与样本混在一起,与依赖环境的噪音混在一起),我们提出了一种新的基于一致性的分类方法,用两个分类人头的一致来识别这些样品:一致性越高,样本越是干净的概率越大。在几个具有挑战性的基准上进行了广泛的实验,展示了我们方法的优劣性。