Learning with noisy labels has aroused much research interest since data annotations, especially for large-scale datasets, may be inevitably imperfect. Recent approaches resort to a semi-supervised learning problem by dividing training samples into clean and noisy sets. This paradigm, however, is prone to significant degeneration under heavy label noise, as the number of clean samples is too small for conventional methods to behave well. In this paper, we introduce a novel framework, termed as LC-Booster, to explicitly tackle learning under extreme noise. The core idea of LC-Booster is to incorporate label correction into the sample selection, so that more purified samples, through the reliable label correction, can be utilized for training, thereby alleviating the confirmation bias. Experiments show that LC-Booster advances state-of-the-art results on several noisy-label benchmarks, including CIFAR-10, CIFAR-100, Clothing1M and WebVision. Remarkably, under the extreme 90\% noise ratio, LC-Booster achieves 92.9\% and 48.4\% accuracy on CIFAR-10 and CIFAR-100, surpassing state-of-the-art methods by a large margin.
翻译:由于数据说明,特别是大型数据集的数据说明,可能不可避免地不尽人意,因此,以吵闹标签进行学习引起了很大的研究兴趣,因为数据说明,特别是大型数据集的数据说明,可能不可避免地不完美。最近的做法采用半监督的学习问题,将训练样品分为清洁和吵闹的一组。不过,这种范式容易在重标签噪音下发生显著退化,因为清洁样品的数量太少,常规方法无法很好地行事。在本文中,我们引入了一个称为LC-Booster的新框架,以明确解决极端噪音下的学习问题。LC-Booster的核心思想是将标签校正纳入抽样选择,以便通过可靠的标签校正,将更纯化的样品用于培训,从而减轻确认的偏差。实验表明LC-Booster在几个噪音基准上取得了最新的结果,包括CIFAR-10、CIFAR-100、Scarlor1M和WebVision。在极端的90-噪音比率下,LC-Booster在CFAR-10和CIFAR-100上取得了92-9和48.4 ⁇ 准确度的CFAR-100,以大型方法超越了地段。