This paper proposes a thorough theoretical analysis of Stochastic Gradient Descent (SGD) with non-increasing step sizes. First, we show that the recursion defining SGD can be provably approximated by solutions of a time inhomogeneous Stochastic Differential Equation (SDE) using an appropriate coupling. In the specific case of a batch noise we refine our results using recent advances in Stein's method. Then, motivated by recent analyses of deterministic and stochastic optimization methods by their continuous counterpart, we study the long-time behavior of the continuous processes at hand and establish non-asymptotic bounds. To that purpose, we develop new comparison techniques which are of independent interest. Adapting these techniques to the discrete setting, we show that the same results hold for the corresponding SGD sequences. In our analysis, we notably improve non-asymptotic bounds in the convex setting for SGD under weaker assumptions than the ones considered in previous works. Finally, we also establish finite-time convergence results under various conditions, including relaxations of the famous {\L}ojasiewicz inequality, which can be applied to a class of non-convex functions.
翻译:本文建议对不增加步骤大小的沙粒梯底部(SGD)进行彻底的理论分析。 首先,我们表明,循环定义 SGD 可以通过使用适当的混合方法,用时间不均匀的沙粒差异等分法(SDE)的解决方案来比较。 在分批噪音的具体情况下,我们利用Stein方法的最新进步来改进我们的结果。 然后,在连续对应方最近对确定性和沙粒优化方法的分析的推动下,我们研究了手头连续过程的长期行为,并建立了非消毒界限。为此,我们开发了独立感兴趣的新的比较技术。将这些技术适应离散环境,我们表明,同样的结果对相应的SGD序列适用。在我们的分析中,我们明显地改进了SGD在比以往工作所考虑的假设更弱的二次曲线设置中的非消毒界限。最后,我们还在各种条件下建立了有限时间趋同结果,包括将著名的 {Ljasiex 级的不伸缩功能应用到不平等。