Learning with noisy labels is an important and challenging task for training accurate deep neural networks. Some commonly-used loss functions, such as Cross Entropy (CE), suffer from severe overfitting to noisy labels. Robust loss functions that satisfy the symmetric condition were tailored to remedy this problem, which however encounter the underfitting effect. In this paper, we theoretically prove that \textbf{any loss can be made robust to noisy labels} by restricting the network output to the set of permutations over a fixed vector. When the fixed vector is one-hot, we only need to constrain the output to be one-hot, which however produces zero gradients almost everywhere and thus makes gradient-based optimization difficult. In this work, we introduce the sparse regularization strategy to approximate the one-hot constraint, which is composed of network output sharpening operation that enforces the output distribution of a network to be sharp and the $\ell_p$-norm ($p\le 1$) regularization that promotes the network output to be sparse. This simple approach guarantees the robustness of arbitrary loss functions while not hindering the fitting ability. Experimental results demonstrate that our method can significantly improve the performance of commonly-used loss functions in the presence of noisy labels and class imbalance, and outperform the state-of-the-art methods. The code is available at https://github.com/hitcszx/lnl_sr.
翻译:使用噪音标签学习是一项重要且具有挑战性的任务, 用于培训准确的深神经网络。 一些常用的损失功能, 如 Cross Entropy (CE), 被严重夸大为噪音标签。 符合对称条件的强力损失功能被专门设计来解决这个问题, 但是却遇到不适当效果。 在本文中, 我们理论上证明, 将网络输出限制在固定矢量的一组变异上, 可以使网络输出变得强强到噪音标签 。 当固定矢量为一热时, 我们只需要将输出限制为一热功能, 但是会几乎在所有地方产生零梯度, 从而使得基于梯度的优化变得困难。 在这项工作中, 我们引入了稀少的规范化策略, 以近于一热限制。 由网络输出强化操作组成, 使网络输出分布变得尖锐, 而 $\ell_ p$-normum (p\le 1$) 正规化, 从而推动网络输出变得稀薄。 这种简单的方法可以保证任意损失功能的稳健性, 而不会阻碍正常的等级/ 。 实验性结果将显示我们共同的等级/ 的标签/ 的状态的状态的运行的状态的状态。