Deep neural networks (DNNs) are capable of perfectly fitting the training data, including memorizing noisy data. It is commonly believed that memorization hurts generalization. Therefore, many recent works propose mitigation strategies to avoid noisy data or correct memorization. In this work, we step back and ask the question: Can deep learning be robust against massive label noise without any mitigation? We provide an affirmative answer for the case of symmetric label noise: We find that certain DNNs, including under-parameterized and over-parameterized models, can tolerate massive symmetric label noise up to the information-theoretic threshold. By appealing to classical statistical theory and universal consistency of DNNs, we prove that for multiclass classification, $L_1$-consistent DNN classifiers trained under symmetric label noise can achieve Bayes optimality asymptotically if the label noise probability is less than $\frac{K-1}{K}$, where $K \ge 2$ is the number of classes. Our results show that for symmetric label noise, no mitigation is necessary for $L_1$-consistent estimators. We conjecture that for general label noise, mitigation strategies that make use of the noisy data will outperform those that ignore the noisy data.
翻译:深心神经网络( DNNS) 能够完全匹配培训数据, 包括模拟噪音数据。 一般认为, 记忆化会伤害一般化。 因此, 许多最近的工作都提出了缓解战略, 以避免吵闹数据或纠正记忆化。 在这项工作中, 我们退后问: 深度学习能否对大型标签噪音有力而不会有任何减缓? 我们为对称标签噪音提供了肯定的答案: 我们发现某些 DNS, 包括测量不足和过度计量的模型, 可以容忍大量对称标签噪音, 直至信息理论临界值。 通过呼吁传统统计理论和DNNNS的普遍一致性, 我们证明对于多级分类, 在对称标签噪音下训练的 $_1 一致的 DNN 分类人员, 如果标签噪音概率低于$\frac{K-1 ⁇ K}, 我们发现, $K\ge 2美元是班级数量。 我们的结果显示, 用于测量标签的噪音, 将不需要为 ASQ1 数据格式。