The success of deep neural networks greatly relies on the availability of large amounts of high-quality annotated data, which however are difficult or expensive to obtain. The resulting labels may be class imbalanced, noisy or human biased. It is challenging to learn unbiased classification models from imperfectly annotated datasets, on which we usually suffer from overfitting or underfitting. In this work, we thoroughly investigate the popular softmax loss and margin-based loss, and offer a feasible approach to tighten the generalization error bound by maximizing the minimal sample margin. We further derive the optimality condition for this purpose, which indicates how the class prototypes should be anchored. Motivated by theoretical analysis, we propose a simple yet effective method, namely prototype-anchored learning (PAL), which can be easily incorporated into various learning-based classification schemes to handle imperfect annotation. We verify the effectiveness of PAL on class-imbalanced learning and noise-tolerant learning by extensive experiments on synthetic and real-world datasets.
翻译:深层神经网络的成功在很大程度上取决于能否获得大量高质量的附加说明的数据,但这些数据很难获得或费用昂贵。因此,标签可能是阶级不平衡、吵闹或偏颇。从不完善的附加说明的数据集中学习公正分类模式是困难的,我们通常会为此过多或不足。在这项工作中,我们彻底调查流行的软负损失和以差值为基础的损失,并提供一个可行的办法,通过尽量扩大最低抽样差幅来收紧一般化错误。我们进一步得出这一目的的最佳性条件,表明该类的原型应该如何固定。我们根据理论分析,提出了简单而有效的方法,即原型和原型学习(PAL),这很容易纳入各种基于学习的分类计划,以处理不完善的注解。我们通过对合成和现实世界数据集的广泛实验,核查PAL在课堂平衡学习和耐噪音学习方面的有效性。