A regression model with more parameters than data points in the training data is overparametrized and has the capability to interpolate the training data. Based on the classical bias-variance tradeoff expressions, it is commonly assumed that models which interpolate noisy training data are poor to generalize. In some cases, this is not true. The best models obtained are overparametrized and the testing error exhibits the double descent behavior as the model order increases. In this contribution, we provide some analysis to explain the double descent phenomenon, first reported in the machine learning literature. We focus on interpolating models derived from the minimum norm solution to the classical least-squares problem and also briefly discuss model fitting using ridge regression. We derive a result based on the behavior of the smallest singular value of the regression matrix that explains the peak location and the double descent shape of the testing error as a function of model order.
翻译:如果一个回归模型的参数比拟合数据的点还多,那么该模型就是过度参数化的,有能力插值拟合数据。基于经典的偏差-方差权衡表达式,通常认为插值拟合嘈杂的训练数据的模型不能很好地泛化。但在某些情况下,这并不是真的。获得的最佳模型是过度参数化的,并且,随着模型阶数的增加,测试误差表现出双峰现象。在本文中,我们对机器学习文献中首次报道的双峰现象进行了一些分析。我们侧重于基于经典最小二乘问题的最小范数解得到的插值模型,并简要讨论了使用岭回归拟合模型。我们根据回归矩阵最小奇异值的行为推导出一个结果,该结果解释了测试误差随模型阶数的峰值位置和双峰形状的原因。