Stochastic gradient descent with momentum (SGDM) is the dominant algorithm in many optimization scenarios, including convex optimization instances and non-convex neural network training. Yet, in the stochastic setting, momentum interferes with gradient noise, often leading to specific step size and momentum choices in order to guarantee convergence, set aside acceleration. Proximal point methods, on the other hand, have gained much attention due to their numerical stability and elasticity against imperfect tuning. Their stochastic accelerated variants though have received limited attention: how momentum interacts with the stability of (stochastic) proximal point methods remains largely unstudied. To address this, we focus on the convergence and stability of the stochastic proximal point algorithm with momentum (SPPAM), and show that SPPAM allows a faster linear convergence rate compared to stochastic proximal point algorithm (SPPA) with a better contraction factor, under proper hyperparameter tuning. In terms of stability, we show that SPPAM depends on problem constants more favorably than SGDM, allowing a wider range of step size and momentum that lead to convergence.
翻译:然而,在许多优化情景中,包括锥形优化和非锥体神经网络培训中,惯性梯度下降(SGDM)是主要算法。然而,在随机环境下,动性会干扰梯度噪音,往往导致具体的步态大小和动力选择,以保证趋同,抛开加速。另一方面,极准点方法因其数值稳定性和弹性与不完善的调制相比而得到极大关注。虽然它们的随机加速变异体得到的关注有限:动性会如何与(随机)准点方法的稳定发生互动,在很大程度上仍未受到研究。为了解决这一问题,我们侧重于以动力(SPPAM)为主的蒸气准点算法(SPPAM)的趋同和稳定性,并表明SPPAM允许更快的线性趋同率,而慢度准点算法(SPPA)与较佳的收缩系数相比,在适当的超光度调下,我们显示SPPAM依赖的问题持续比SGDMDM更有利的,允许更广泛的步骤和趋同力。