The risk of overparameterized models, in particular deep neural networks, is often double-descent shaped as a function of the model size. Recently, it was shown that the risk as a function of the early-stopping time can also be double-descent shaped, and this behavior can be explained as a super-position of bias-variance tradeoffs. In this paper, we show that the risk of explicit L2-regularized models can exhibit double descent behavior as a function of the regularization strength, both in theory and practice. We find that for linear regression, a double descent shaped risk is caused by a superposition of bias-variance tradeoffs corresponding to different parts of the model and can be mitigated by scaling the regularization strength of each part appropriately. Motivated by this result, we study a two-layer neural network and show that double descent can be eliminated by adjusting the regularization strengths for the first and second layer. Lastly, we study a 5-layer CNN and ResNet-18 trained on CIFAR-10 with label noise, and CIFAR-100 without label noise, and demonstrate that all exhibit double descent behavior as a function of the regularization strength.
翻译:超临界模型的风险,特别是深神经网络,往往会因模型大小的功能而形成双向偏差。最近,有证据表明,早期停止时间的功能性风险也可能是双向形状,这种行为可以解释为偏差权衡的叠加位置。在本文中,我们表明,显性L2正规化模型的风险在理论和实践上都可能表现出双重下降,这是正规化强度的函数。我们发现,对于线性回归而言,双向偏差偏差偏差的叠加与模型不同部分相对应,并可以通过适当提高每个部分的正规化强度来减轻这种风险。我们受这一结果的驱动,我们研究一个双层神经网络,并表明通过调整第一和第二层的正规化强度可以消除双向下降。最后,我们研究了在CIFAR-10和CFAR-100上用标签噪音训练的5级CNN和ResNet-18级CNN和ResNet-18,以及没有标签噪音的CIFAR-100,并表明所有显示作为正规化力量功能的双重血统行为。