We propose a simulation framework for generating instance-dependent noisy labels via a pseudo-labeling paradigm. We show that the distribution of the synthetic noisy labels generated with our framework is closer to human labels compared to independent and class-conditional random flipping. Equipped with controllable label noise, we study the negative impact of noisy labels across a few practical settings to understand when label noise is more problematic. We also benchmark several existing algorithms for learning with noisy labels and compare their behavior on our synthetic datasets and on the datasets with independent random label noise. Additionally, with the availability of annotator information from our simulation framework, we propose a new technique, Label Quality Model (LQM), that leverages annotator features to predict and correct against noisy labels. We show that by adding LQM as a label correction step before applying existing noisy label techniques, we can further improve the models' performance.
翻译:我们提出一个模拟框架,通过假标签模式生成以实例为根据的噪音标签。 我们显示,与独立和等级条件随机翻转相比,由我们框架产生的合成噪音标签的分布更接近于人类标签。 配有可控标签噪音,我们研究在几个实际环境中的噪音标签的负面影响,以了解标签噪音何时更成问题。 我们还以一些现有的算法为基准,以学习噪音标签,并以独立随机标签噪音来比较合成数据集和数据集中的行为。 此外,随着我们模拟框架提供说明信息,我们提出了一个新的技术,即Label质量模型(LQM),利用标记特征来预测和纠正噪音标签。我们显示,在应用现有噪声标签技术之前,我们通过添加LQM作为标签校正步骤,我们可以进一步改进模型的性能。