In deep learning, often the training process finds an interpolator (a solution with 0 training loss), but the test loss is still low. This phenomenon, known as benign overfitting, is a major mystery that received a lot of recent attention. One common mechanism for benign overfitting is implicit regularization, where the training process leads to additional properties for the interpolator, often characterized by minimizing certain norms. However, even for a simple sparse linear regression problem $y = \beta^{*\top} x +\xi$ with sparse $\beta^*$, neither minimum $\ell_1$ or $\ell_2$ norm interpolator gives the optimal test loss. In this work, we give a different parametrization of the model which leads to a new implicit regularization effect that combines the benefit of $\ell_1$ and $\ell_2$ interpolators. We show that training our new model via gradient descent leads to an interpolator with near-optimal test loss. Our result is based on careful analysis of the training dynamics and provides another example of implicit regularization effect that goes beyond norm minimization.
翻译:在深层次的学习中,培训过程往往会找到一个中间点(一个解决零培训损失的解决方案),但测试损失仍然很低。这一现象被称为良性过度,是最近引起关注的一大谜题。一种常见的良性过度改造机制是隐含的正规化机制,在这种机制中,培训过程会给内插者带来额外的特性,通常以尽量减少某些规范为特征。然而,即使对于简单的线性回归问题,培训过程也会发现一个简单的线性回归问题,$y =\beta ⁇ t}x xxxxxxxxxxx$, 零美元,最低值为1美元或$2美元规范内插器,也不会带来最佳的测试损失。在这项工作中,我们给出了一种不同的模型的隐含的正规化效果,将1美元和$\ell_2美元之间的效益结合起来。我们证明,我们的新模型通过梯度下降导致一个内插器,其测试损失几乎是最佳的。我们的结果是以对培训动态的仔细分析为基础,并提供了另一个隐含的正规化效果的范例,超越了规范的最大限度。