We propose a simulation framework for generating realistic instance-dependent noisy labels via a pseudo-labeling paradigm. We show that this framework generates synthetic noisy labels that exhibit important characteristics of the label noise in practical settings via comparison with the CIFAR10-H dataset. Equipped with controllable label noise, we study the negative impact of noisy labels across a few realistic settings to understand when label noise is more problematic. We also benchmark several existing algorithms for learning with noisy labels and compare their behavior on our synthetic datasets and on the datasets with independent random label noise. Additionally, with the availability of annotator information from our simulation framework, we propose a new technique, Label Quality Model (LQM), that leverages annotator features to predict and correct against noisy labels. We show that by adding LQM as a label correction step before applying existing noisy label techniques, we can further improve the models' performance.
翻译:我们提出一个模拟框架,通过假标签模式生成现实的、以实例为根据的噪音标签。我们显示,这个框架生成合成噪音标签,与CIFAR10-H数据集相比,在实际环境中,在标签噪音具有重要特点。我们用可控标签噪音来研究在几个现实环境中的噪音标签的消极影响,以了解标签噪音何时更成问题。我们还以一些现有的算法为基准,以学习噪音标签,并以独立随机标签噪音来比较合成数据集和数据集中的行为。此外,随着我们模拟框架提供说明信息,我们提出了一个新的技术,即Label质量模型(LQM),利用标记特征来预测和纠正噪音标签。我们表明,在应用现有噪声标签技术之前,通过添加LQM作为标签校正步骤,我们可以进一步改进模型的性能。