We consider the momentum stochastic gradient descent scheme (MSGD) and its continuous-in-time counterpart in the context of non-convex optimization. We show almost sure exponential convergence of the objective function value for target functions that are Lipschitz continuous and satisfy the Polyak-Lojasiewicz inequality on the relevant domain, and under assumptions on the stochastic noise that are motivated by overparameterized supervised learning applications. Moreover, we optimize the convergence rate over the set of friction parameters and show that the MSGD process almost surely converges.
翻译:我们从非电流优化的角度来考虑动力随机梯度下降计划(MSGD)及其在非电流优化背景下的连续时间对应机制(MSGD),我们几乎可以肯定地看出,利普西茨目标功能的客观功能价值的指数趋同,满足了相关领域的Polyak-Lojasiewicz不平等,并基于过度分计的受监督学习应用驱动的随机声音的假设。 此外,我们优化了摩擦参数的趋同率,并表明MSGD进程几乎肯定会趋同。