Sharpness-aware minimization (SAM) is a recently proposed training method that seeks to find flat minima in deep learning, resulting in state-of-the-art performance across various domains. Instead of minimizing the loss of the current weights, SAM minimizes the worst-case loss in its neighborhood in the parameter space. In this paper, we demonstrate that SAM dynamics can have convergence instability that occurs near a saddle point. Utilizing the qualitative theory of dynamical systems, we explain how SAM becomes stuck in the saddle point and then theoretically prove that the saddle point can become an attractor under SAM dynamics. Additionally, we show that this convergence instability can also occur in stochastic dynamical systems by establishing the diffusion of SAM. We prove that SAM diffusion is worse than that of vanilla gradient descent in terms of saddle point escape. Further, we demonstrate that often overlooked training tricks, momentum and batch-size, are important to mitigate the convergence instability and achieve high generalization performance. Our theoretical and empirical results are thoroughly verified through experiments on several well-known optimization problems and benchmark tasks.
翻译:锐化意识最小化(SAM)是最近提出的一种培训方法,旨在寻找深层学习中平坦的迷你,从而在各个领域取得最先进的业绩。SAM不是将现有重量的损失降到最低,而是将参数空间附近最坏的情况损失降到最低。在本文中,我们证明SAM的动态动态可能会在马鞍点附近出现趋同性不稳定。我们利用动态系统的质量理论,解释SAM如何被困在马鞍点上,然后从理论上证明,在SAM的动态下,马鞍点可以成为吸引器。此外,我们表明,通过建立SAM的传播,这种趋同性动态系统也可能出现这种趋同性不稳定。我们证明,SAM的传播比马鞍点脱落时的香草梯度梯落差。此外,我们证明,常常忽视训练技巧、动力和批量大小,对于减轻趋同性不稳定和取得高一般化业绩非常重要。我们理论和经验的结果通过对几个众所周知的优化问题和基准任务进行实验得到彻底核实。