Increasing the size of overparameterized neural networks has been shown to improve their generalization performance. However, real-world datasets often contain a significant fraction of noisy labels, which can drastically harm the performance of the models trained on them. In this work, we study how neural networks' test loss changes with model size when the training set contains noisy labels. We show that under a sufficiently large noise-to-sample size ratio, generalization error eventually increases with model size. First, we provide a theoretical analysis on random feature regression and show that this phenomenon occurs as the variance of the generalization loss experiences a second ascent under large noise-to-sample size ratio. Then, we present extensive empirical evidence confirming that our theoretical results hold for neural networks. Furthermore, we empirically observe that the adverse effect of network size is more pronounced when robust training methods are employed to learn from noisy-labeled data. Our results have important practical implications: First, larger models should be employed with extra care, particularly when trained on smaller dataset or using robust learning methods. Second, a large sample size can alleviate the effect of noisy labels and allow larger models to achieve a superior performance even under noise.
翻译:超参数神经网络规模的扩大表明,超光化神经网络的大小正在扩大,以提高其一般性能。然而,现实世界数据集往往含有大量噪音标签,这可能会极大地损害所培训模型的性能。在这项工作中,我们研究当训练数据集包含噪音标签时,神经网络测试损失规模如何以模型大小来变化。我们显示,在噪音比抽样大小的比例足够大的情况下,一般化错误最终会随着模型大小而增加。首先,我们提供关于随机特征回归的理论分析,并表明这种现象会随着一般化损失在大型噪音比抽样规模比下经历第二高而发生。然后,我们提出广泛的实证证据,证实我们的理论结果将维持在神经网络上。此外,我们从经验上观察,如果采用稳健的培训方法从噪音标签数据中学习,网络规模的不利效应会更加明显。我们的结果具有重要的实际影响:首先,大型模型应该特别小心地加以使用,特别是在小的数据集组或使用稳健的学习方法时。第二,大型抽样规模可以减轻噪音标签的影响,甚至可以在更大的模型下实现高超声的性。