In this paper, a general stochastic optimization procedure is studied, unifying several variants of the stochastic gradient descent such as, among others, the stochastic heavy ball method, the Stochastic Nesterov Accelerated Gradient algorithm (S-NAG), and the widely used Adam algorithm. The algorithm is seen as a noisy Euler discretization of a non-autonomous ordinary differential equation, recently introduced by Belotto da Silva and Gazeau, which is analyzed in depth. Assuming that the objective function is non-convex and differentiable, the stability and the almost sure convergence of the iterates to the set of critical points are established. A noteworthy special case is the convergence proof of S-NAG in a non-convex setting. Under some assumptions, the convergence rate is provided under the form of a Central Limit Theorem. Finally, the non-convergence of the algorithm to undesired critical points, such as local maxima or saddle points, is established. Here, the main ingredient is a new avoidance of traps result for non-autonomous settings, which is of independent interest.
翻译:本文研究了一般随机优化程序,统一了随机梯度下降的若干变种,例如重球法、Stochacistic Nesterov加速梯度算法(S-NAG)和广泛使用的亚当算法。这一算法被视为非自主普通差分方的杂音分解,最近由Belotto da Silva和Gazeau采用,对此进行了深入分析。假设客观功能是不可凝固和可变的,迭代国与一组临界点的稳定性和几乎肯定的汇合已经确立。一个值得注意的特殊案例是,S-NAG在非convex环境下的趋同证据。根据一些假设,这种趋同率是以中央限制理论的形式提供的。最后,算法与不受欢迎的临界点(如本地峰值或马鞍点)不相容。在这里,主要成份是避免非自治环境的陷阱结果,这是独立的兴趣。