重球动量法的连续时间近似与离散化误差分析 (Heavy-Ball Momentum Method in Continuous Time and Discretization Error Analysis)

This paper establishes a continuous time approximation, a piece-wise continuous differential equation, for the discrete Heavy-Ball (HB) momentum method with explicit discretization error. Investigating continuous differential equations has been a promising approach for studying the discrete optimization methods. Despite the crucial role of momentum in gradient-based optimization methods, the gap between the original discrete dynamics and the continuous time approximations due to the discretization error has not been comprehensively bridged yet. In this work, we study the HB momentum method in continuous time while putting more focus on the discretization error to provide additional theoretical tools to this area. In particular, we design a first-order piece-wise continuous differential equation, where we add a number of counter terms to account for the discretization error explicitly. As a result, we provide a continuous time model for the HB momentum method that allows the control of discretization error to arbitrary order of the step size. As an application, we leverage it to find a new implicit regularization of the directional smoothness and investigate the implicit bias of HB for diagonal linear networks, indicating how our results can be used in deep learning. Our theoretical findings are further supported by numerical experiments.

翻译：本文针对离散重球（HB）动量法建立了一种具有显式离散化误差的连续时间近似——分段连续微分方程。研究连续微分方程已成为分析离散优化方法的重要途径。尽管动量在基于梯度的优化方法中具有关键作用，但原始离散动力学与连续时间近似之间因离散化误差产生的差异尚未得到系统性弥合。本工作通过更聚焦于离散化误差的分析，在连续时间框架下研究HB动量法，为该领域提供新的理论工具。具体而言，我们构建了一阶分段连续微分方程，通过引入若干补偿项以显式刻画离散化误差。由此建立的HB动量法连续时间模型能够将离散化误差控制在步长的任意阶精度。作为应用，我们利用该模型发现了方向平滑性的新型隐式正则化，并研究了HB在对角线性网络中的隐式偏好，展示了本研究成果在深度学习中的应用潜力。数值实验进一步验证了理论结论。