In this paper, we investigate a general class of stochastic gradient descent (SGD) algorithms, called conditioned SGD, based on a preconditioning of the gradient direction. Under some mild assumptions, namely the $L$-smoothness of the non-convex objective function and some weak growth condition on the noise, we establish the almost sure convergence and the asymptotic normality for a broad class of conditioning matrices. In particular, when the conditioning matrix is an estimate of the inverse Hessian at the optimal point, the algorithm is proved to be asymptotically optimal. The benefits of this approach are validated on simulated and real datasets.
翻译:在本文中,我们根据坡度方向的先决条件,调查了一般类的随机梯度梯度下降算法(SGD),称为有附加条件的SGD(SGD)算法(SGD),根据一些轻微的假设,即非康韦克斯目标函数的偏差值和噪音的一些微弱增长条件,我们为广泛的调制基质确定了几乎可以肯定的趋同性和无附带条件的正常性。特别是当调制矩阵是在最佳点对逆赫西安的估计时,算法被证明是绝对最佳的,这一方法的好处在模拟和真实的数据集中得到验证。