The momentum acceleration technique is widely adopted in many optimization algorithms. However, there is no theoretical answer on how the momentum affects the generalization performance of the optimization algorithms. This paper studies this problem by analyzing the implicit regularization of momentum-based optimization. We prove that on the linear classification problem with separable data and exponential-tailed loss, gradient descent with momentum (GDM) converges to the L2 max-margin solution, which is the same as vanilla gradient descent. That means gradient descent with momentum acceleration still converges to a low-complexity model, which guarantees their generalization. We then analyze the stochastic and adaptive variants of GDM (i.e., SGDM and deterministic Adam) and show they also converge to the L2 max-margin solution. Technically, to overcome the difficulty of the error accumulation in analyzing the momentum, we construct new potential functions to analyze the gap between the model parameter and the max-margin solution. Numerical experiments are conducted and support our theoretical results.
翻译:动力加速技术在许多优化算法中被广泛采用。 但是,对于动力如何影响优化算法的通用性能,没有理论答案。 本文通过分析动力优化的隐性正规化来研究这一问题。 我们证明,在线性分类问题上,数据分离和指数尾量损失,梯度下降与动力(GDM)趋同于L2最大差值(GDM)解决方案,这与香草梯度下降相同。这意味着,动力加速的梯度下降仍然会归结于低复合性模型,保证其普遍性。 然后,我们分析了GDM的随机性和适应性变异(即SGDM和确定性亚当),并表明它们也与L2最大差值解决方案趋同。 从技术上讲,为了克服在分析动力过程中误差累积的困难,我们建立了新的潜在功能来分析模型参数与最大差值解决方案之间的鸿沟。 我们进行了数值实验并支持我们的理论结果。