The diffusion approximation of stochastic gradient descent (SGD) in current literature is only valid on a finite time interval. In this paper, we establish the uniform-in-time diffusion approximation of SGD, by only assuming that the expected loss is strongly convex and some other mild conditions, without assuming the convexity of each random loss function. The main technique is to establish the exponential decay rates of the derivatives of the solution to the backward Kolmogorov equation. The uniform-in-time approximation allows us to study asymptotic behaviors of SGD via the continuous stochastic differential equation (SDE) even when the random objective function $f(\cdot;\xi)$ is not strongly convex.
翻译:目前文献中蒸发性梯度下降的分布近似值(SGD)仅在一定时间间隔内才有效。 在本文中,我们建立了SGD在时间上的统一扩散近似值,仅仅假设预期损失是强烈的顺流和其他一些温和条件,而没有假设每个随机损失函数的共流性。主要技术是确定后向的 Kolmogorov 方程式解决方案衍生物的指数衰减率。 统一的实时近似值允许我们通过连续的SGD差分方程(SDE)研究SGD的无症状行为, 即使随机目标函数$f(\cdot;\xi)$不是强烈的顺流。