Long-tailed learning has attracted much attention recently, with the goal of improving generalisation for tail classes. Most existing works use supervised learning without considering the prevailing noise in the training dataset. To move long-tailed learning towards more realistic scenarios, this work investigates the label noise problem under long-tailed label distribution. We first observe the negative impact of noisy labels on the performance of existing methods, revealing the intrinsic challenges of this problem. As the most commonly used approach to cope with noisy labels in previous literature, we then find that the small-loss trick fails under long-tailed label distribution. The reason is that deep neural networks cannot distinguish correctly-labeled and mislabeled examples on tail classes. To overcome this limitation, we establish a new prototypical noise detection method by designing a distance-based metric that is resistant to label noise. Based on the above findings, we propose a robust framework,~\algo, that realizes noise detection for long-tailed learning, followed by soft pseudo-labeling via both label smoothing and diverse label guessing. Moreover, our framework can naturally leverage semi-supervised learning algorithms to further improve the generalisation. Extensive experiments on benchmark and real-world datasets demonstrate the superiority of our methods over existing baselines. In particular, our method outperforms DivideMix by 3\% in test accuracy. Source code will be released soon.
翻译:长尾类学习最近引起了许多关注,目的是改进尾类的概括性。大多数现有工作使用监督性学习,而没有考虑到培训数据集中普遍存在的噪音。为了将长尾类学习转向更现实的情景,这项工作调查了长尾类标签分发过程中的标签噪音问题。我们首先观察了噪音标签对现有方法性能的负面影响,揭示了这一问题的内在挑战。作为以往文献中最常用的处理噪音标签的方法,我们随后发现,在长尾类标签分发过程中,小损失伎俩失败了。原因在于深神经网络无法区分尾类中正确标签和错误标签的例子。为了克服这一限制,我们设计了一种新的原型噪音检测方法,设计了一种不贴噪音的远程标准。根据上述研究结果,我们建议了一个强有力的框架,为长尾类学习找到噪音检测方法,然后通过标签平滑和多种标签猜测,发现软假标签失败。此外,我们的框架可以自然地利用半超导型学习算法来进一步改进我们现有基准和标准基础化方法。M 深入地展示了我们现有基准的升级方法。