Conventional de-noising methods rely on the assumption that all samples are independent and identically distributed, so the resultant classifier, though disturbed by noise, can still easily identify the noises as the outliers of training distribution. However, the assumption is unrealistic in large-scale data that is inevitably long-tailed. Such imbalanced training data makes a classifier less discriminative for the tail classes, whose previously "easy" noises are now turned into "hard" ones -- they are almost as outliers as the clean tail samples. We introduce this new challenge as Noisy Long-Tailed Classification (NLT). Not surprisingly, we find that most de-noising methods fail to identify the hard noises, resulting in significant performance drop on the three proposed NLT benchmarks: ImageNet-NLT, Animal10-NLT, and Food101-NLT. To this end, we design an iterative noisy learning framework called Hard-to-Easy (H2E). Our bootstrapping philosophy is to first learn a classifier as noise identifier invariant to the class and context distributional changes, reducing "hard" noises to "easy" ones, whose removal further improves the invariance. Experimental results show that our H2E outperforms state-of-the-art de-noising methods and their ablations on long-tailed settings while maintaining a stable performance on the conventional balanced settings. Datasets and codes are available at https://github.com/yxymessi/H2E-Framework
翻译:常规除尘方法所依据的假设是,所有样品都是独立和完全分布的,因此,结果分类者虽然受到噪音的干扰,仍然可以很容易地将噪音确定为培训分布的异常点。然而,这种假设对于不可避免的长尾类的大规模数据来说是不现实的。这种不平衡的培训数据使得分类者对尾类的区别性较小,它们以前“容易”的噪音现在变成了“硬的”噪音,它们几乎和干净的尾巴样品一样有外向性。我们引入了象Noisy Long-Tailed分类(NLT)这样的新挑战。我们毫不奇怪地发现,大多数不注意方法都无法确定硬噪音,导致拟议的NLT的三个基准(图像网-NLT、Morest10-NLT和Food101-NLT)的性能显著下降。我们为此设计了一个叫“硬到Easy(H2E2E)”的热度学习框架。我们的踢踏式哲学是首先学习一个分类器,作为对课堂和背景分布变化的噪音识别器。我们发现,“硬性2级的噪音定型”的噪音在磁性结构中,在“稳定的磁性变变变变换法中,在“稳定的磁性变变换法中,在“稳定式”的磁性变换式的磁性变换制方法上显示中,在“稳定式的磁性变换式的磁制方法在“稳定性变换式的磁。