The drastic increase of data quantity often brings the severe decrease of data quality, such as incorrect label annotations, which poses a great challenge for robustly training Deep Neural Networks (DNNs). Existing learning \mbox{methods} with label noise either employ ad-hoc heuristics or restrict to specific noise assumptions. However, more general situations, such as instance-dependent label noise, have not been fully explored, as scarce studies focus on their label corruption process. By categorizing instances into confusing and unconfusing instances, this paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances. The resultant model can be realized by DNNs, where the training procedure is accomplished by employing an alternating optimization algorithm. Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness over state-of-the-art counterparts.
翻译:数据数量的急剧增加往往导致数据质量的大幅下降,例如不正确的标签说明,这对大力培训深神经网络(DNNs)构成巨大挑战。现有的学习\mbox{方法},标签噪音要么采用特别的超激性或限于特定的噪音假设。然而,更一般的情况,例如依赖实例的标签噪音,还没有得到充分探讨,因为很少的研究侧重于标签腐败过程。通过将事件分为混淆和不分类的例子,本文件提出了一个简单而普遍的概率模型,明确将吵闹的标签与它们的例子联系起来。产生模型可由DNNs实现,通过采用交替优化算法完成培训程序。关于合成和现实世界标签噪音的实验证实,拟议的方法大大改善了最先进的对应方的稳健性。