We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization method for deep networks that has exhibited performance improvements on image and language prediction problems. We show that when SAM is applied with a convex quadratic objective, for most random initializations it converges to a cycle that oscillates between either side of the minimum in the direction with the largest curvature, and we provide bounds on the rate of convergence. In the non-quadratic case, we show that such oscillations effectively perform gradient descent, with a smaller step-size, on the spectral norm of the Hessian. In such cases, SAM's update may be regarded as a third derivative -- the derivative of the Hessian in the leading eigenvector direction -- that encourages drift toward wider minima.
翻译:我们认为,尖锐度最小化(SAM)是深度网络的一种梯度优化方法,在图像和语言预测问题方面表现出性能改善。我们表明,在应用SAM时,以二次曲线为目的,对于大多数随机初始化而言,SAM会汇合到一个最小两边向最大弯曲方向摇动的循环中,我们提供了趋同率的界限。在非二次曲线的情况下,我们表明,这种振动在赫森人的光谱规范上有效地表现了梯度下降,以较小的步进尺寸。在这种情况下,SAM的更新可被视为第三个衍生物,即海珊人向前叶子方向的衍生物,鼓励向更宽的迷你马漂移。