Stochastic gradient descent is an optimisation method that combines classical gradient descent with random subsampling within the target functional. In this work, we introduce the stochastic gradient process as a continuous-time representation of stochastic gradient descent. The stochastic gradient process is a dynamical system that is coupled with a continuous-time Markov process living on a finite state space. The dynamical system -- a gradient flow -- represents the gradient descent part, the process on the finite state space represents the random subsampling. Processes of this type are, for instance, used to model clonal populations in fluctuating environments. After introducing it, we study theoretical properties of the stochastic gradient process: We show that it converges weakly to the gradient flow with respect to the full target function, as the learning rate approaches zero. We give conditions under which the stochastic gradient process with constant learning rate is exponentially ergodic in the Wasserstein sense. Then we study the case, where the learning rate goes to zero sufficiently slowly and the single target functions are strongly convex. In this case, the process converges weakly to the point mass concentrated in the global minimum of the full target function; indicating consistency of the method. We conclude after a discussion of discretisation strategies for the stochastic gradient process and numerical experiments.
翻译:惯性梯度下降是一种优化方法, 将古典梯度下降与目标功能内的随机亚抽样结合起来。 在这项工作中, 我们引入了静性梯度进程, 作为随机梯度下降的连续时间代表。 静性梯度进程是一个动态系统, 伴之以持续时间的Markov 进程, 在有限的状态空间中生活。 动态系统 -- -- 梯度流 -- 代表梯度下降部分, 限定状态空间上的进程代表随机的子抽样。 例如, 这种类型的进程被用来模拟波动环境中的凝聚人口。 在引入该过程之后, 我们研究随机梯度梯度梯度进程的理论属性: 我们显示, 与完全目标函数相比, 渐渐渐渐渐渐渐的梯流的汇变微的汇合, 因为学习率接近零。 我们给出了持续学习率的随机梯度进程在瓦勒斯坦感意义上是指数指数指数指数指数指数指数的指数化部分。 然后我们研究这个案例, 学习速度足够慢到零, 单项目标函数是强烈的交错。 在这个案例中, 我们以最小的递化的递化方法 结束全球的递化过程。