Stochastic gradient descent with momentum (SGDM) is the dominant algorithm in many optimization scenarios, including convex optimization instances and non-convex neural network training. Yet, in the stochastic setting, momentum interferes with gradient noise, often leading to specific step size and momentum choices in order to guarantee convergence, set aside acceleration. Proximal point methods, on the other hand, have gained much attention due to their numerical stability and elasticity against imperfect tuning. Their stochastic accelerated variants though have received limited attention: how momentum interacts with the stability of (stochastic) proximal point methods remains largely unstudied. To address this, we focus on the convergence and stability of the stochastic proximal point algorithm with momentum (SPPAM), and show that SPPAM allows a faster linear convergence to a neighborhood compared to stochastic proximal point algorithm (SPPA) with a better contraction factor, under proper hyperparameter tuning. In terms of stability, we show that SPPAM depends on problem constants more favorably than SGDM, allowing a wider range of step size and momentum that lead to convergence.
翻译:具有动力的电磁梯度下降(SGDM)是许多优化假设情景中的主要算法,包括螺旋优化场景和非螺旋神经网络培训。然而,在随机环境中,动力干扰了梯度噪音,往往导致具体的步骤大小和动力选择,以保证趋同,抛开加速。另一方面,极准点方法因其数值稳定性和弹性与不完善的调制相比,获得了很大的注意。虽然它们的蒸馏加速变异体受到的注意有限:动力与(随机)准点方法的稳定性的相互作用在很大程度上仍未受到研究。要解决这个问题,我们注重的是具有动力(SPPAM)的蒸气准点算法的趋同和稳定性,并表明SPPAM允许与具有较佳收缩系数的速线趋同,在适当的超光度调下,比SGDM更倾向于问题不变。我们表明,SPAM取决于问题常态,比SGDM方法(SPPAM)的趋同,从而能够形成更广泛的步势趋一致。