Sharpness-Aware Minimization (SAM) is a recent optimization framework aiming to improve the deep neural network generalization, through obtaining flatter (i.e. less sharp) solutions. As SAM has been numerically successful, recent papers have studied the theoretical aspects of the framework. In this work, we study SAM through an implicit regularization lens, and present a new theoretical explanation of why SAM generalizes well. To this end, we study the least-squares linear regression problem and show a bias-variance trade-off for SAM's error over the course of the algorithm. We show SAM has lower bias compared to Gradient Descent (GD), while having higher variance. This shows SAM can outperform GD, specially if the algorithm is \emph{stopped early}, which is often the case when training large neural networks due to the prohibitive computational cost. We extend our results to kernel regression, as well as stochastic optimization and discuss how implicit regularization of SAM can improve upon vanilla training.
翻译:锐化最小化(SAM)是最近的一个优化框架,旨在通过获得优美(即不那么尖锐)的解决方案,改善深神经网络的概括化。由于SAM在数字上取得了成功,最近的一些论文研究了框架的理论方面。在这项工作中,我们通过隐含的正规化透镜来研究SAM,并提出了为什么SAM全面化的新理论解释。为此,我们研究了最小平方线回归问题,并展示了在算法过程中SAM错误的偏差权衡。我们展示了SAM与梯度源(GD)相比的偏差较低,但差异更大。这显示SAM可以超过GD,特别是如果算法是emph{soped early},这通常是由于计算成本过高而培训大型神经网络的情况。我们把结果推广到内核回归,以及随机优化,并讨论SAM在香草训练时如何不言明的正规化。