超参数化模型的偏差、差异和内插 (Memorizing without overfitting: Bias, variance, and interpolation in over-parameterized models)

The bias-variance trade-off is a central concept in supervised learning. In classical statistics, increasing the complexity of a model (e.g., number of parameters) reduces bias but also increases variance. Until recently, it was commonly believed that optimal performance is achieved at intermediate model complexities which strike a balance between bias and variance. Modern Deep Learning methods flout this dogma, achieving state-of-the-art performance using "over-parameterized models" where the number of fit parameters is large enough to perfectly fit the training data. As a result, understanding bias and variance in over-parameterized models has emerged as a fundamental problem in machine learning. Here, we use methods from statistical physics to derive analytic expressions for bias and variance in two minimal models of over-parameterization (linear regression and two-layer neural networks with nonlinear data distributions), allowing us to disentangle properties stemming from the model architecture and random sampling of data. In both models, increasing the number of fit parameters leads to a phase transition where the training error goes to zero and the test error diverges as a result of the variance (while the bias remains finite). Beyond this threshold in the interpolation regime, the training error remains zero while the test error decreases. We also show that in contrast with classical intuition, over-parameterized models can overfit even in the absence of noise and exhibit bias even if the student and teacher models match. We synthesize these results to construct a holistic understanding of generalization error and the bias-variance trade-off in over-parameterized models and relate our results to random matrix theory.

翻译：偏差权衡是受监督学习的一个核心概念。在古典统计中,增加模型的复杂性(例如参数数量),减少了偏差,但也增加了差异。直到最近,人们通常认为,在中间模型复杂度中达到最佳性能,从而在偏差和差异之间取得平衡。现代深层次学习方法蔑视这一教条,利用“超分化模型”实现最先进的性能,在“超分化模型”中,适合参数的数量足够大,足以完全适合培训数据。因此,理解过度分数模型的偏差和差异,已成为机器学习的一个根本问题。在这里,我们从统计物理中采用的方法,在两种最低的多分数模型中得出偏差和差异的最佳表现。现代深层学习方法无视这一教程,利用“超分数分数模型”实现最先进的性能,在这两个模型中,匹配参数的数量增加导致一个阶段的过渡,在培训错误变为零分数,测试错误则因差异而出现(尽管偏差是偏差性的) 。在模型中,我们这个分数的分数中,在比标准分析中,我们比标准差的比标准的比标准差也比了。