Instance- and Label-dependent label Noise (ILN) widely exists in real-world datasets but has been rarely studied. In this paper, we focus on Bounded Instance- and Label-dependent label Noise (BILN), a particular case of ILN where the label noise rates -- the probabilities that the true labels of examples flip into the corrupted ones -- have upper bound less than $1$. Specifically, we introduce the concept of distilled examples, i.e. examples whose labels are identical with the labels assigned for them by the Bayes optimal classifier, and prove that under certain conditions classifiers learnt on distilled examples will converge to the Bayes optimal classifier. Inspired by the idea of learning with distilled examples, we then propose a learning algorithm with theoretical guarantees for its robustness to BILN. At last, empirical evaluations on both synthetic and real-world datasets show effectiveness of our algorithm in learning with BILN.
翻译:在现实世界的数据集中,大量存在以实例和标签为依存的标签噪音(ILN),但很少加以研究。在本文中,我们着重讨论了以实例和标签为依存的标签噪音(BILN),这是ILN的一个特例,其中标签噪音率 -- -- 将实例的真标签转贴到腐蚀的标签的概率 -- -- 上限小于1美元。具体地说,我们引入了蒸馏实例的概念,即标签与Bayes最佳分类员为其指定的标签相同的例子,并证明在某些条件下,从蒸馏实例中学习的分类者将聚集到Bayes最佳分类者。在用精选的例子学习的理念的启发下,我们随后提出了一个学习算法,在理论上保证其对BILN的稳健性。最后,关于合成和真实世界数据集的经验评估显示了我们与BILN学习的算法的有效性。