最后的高潮:当大模型在噪音拉动数据上更加普遍化时 (The Final Ascent: When Bigger Models Generalize Worse on Noisy-Labeled Data)

Increasing the size of overparameterized neural networks has been shown to improve their generalization performance. However, real-world datasets often contain a significant fraction of noisy labels, which can drastically harm the performance of the models trained on them. In this work, we study how neural networks' test loss changes with model size when the training set contains noisy labels. We show that under a sufficiently large noise-to-sample size ratio, generalization error eventually increases with model size. First, we provide a theoretical analysis on random feature regression and show that this phenomenon occurs as the variance of the generalization loss experiences a second ascent under large noise-to-sample size ratio. Then, we present extensive empirical evidence confirming that our theoretical results hold for neural networks. Furthermore, we empirically observe that the adverse effect of network size is more pronounced when robust training methods are employed to learn from noisy-labeled data. Our results have important practical implications: First, larger models should be employed with extra care, particularly when trained on smaller dataset or using robust learning methods. Second, a large sample size can alleviate the effect of noisy labels and allow larger models to achieve a superior performance even under noise.

翻译：超参数神经网络规模的扩大表明,超光化神经网络的大小正在扩大,以提高其一般性能。然而,现实世界数据集往往含有大量噪音标签,这可能会极大地损害所培训模型的性能。在这项工作中,我们研究当训练数据集包含噪音标签时,神经网络测试损失规模如何以模型大小来变化。我们显示,在噪音比抽样大小的比例足够大的情况下,一般化错误最终会随着模型大小而增加。首先,我们提供关于随机特征回归的理论分析,并表明这种现象会随着一般化损失在大型噪音比抽样规模比下经历第二高而发生。然后,我们提出广泛的实证证据,证实我们的理论结果将维持在神经网络上。此外,我们从经验上观察,如果采用稳健的培训方法从噪音标签数据中学习,网络规模的不利效应会更加明显。我们的结果具有重要的实际影响:首先,大型模型应该特别小心地加以使用,特别是在小的数据集组或使用稳健的学习方法时。第二,大型抽样规模可以减轻噪音标签的影响,甚至可以在更大的模型下实现高超声的性。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/