In this paper, we investigate a general class of stochastic gradient descent (SGD) algorithms, called conditioned SGD, based on a preconditioning of the gradient direction. Using a discrete-time approach with martingale tools, we establish the weak convergence of the rescaled sequence of iterates for a broad class of conditioning matrices including stochastic first-order and second-order methods. Almost sure convergence results, which may be of independent interest, are also presented. When the conditioning matrix is an estimate of the inverse Hessian, the algorithm is proved to be asymptotically optimal. For the sake of completeness, we provide a practical procedure to achieve this minimum variance.
翻译:在本文中,我们根据坡度方向的前提条件,调查了一般类的随机梯度梯度下降算法(SGD),称为有附加条件的SGD。使用离散时间方法使用马丁格尔工具,我们确定对包括随机第一阶和第二阶方法在内的广泛类调控矩阵重新测序序列的趋同性不强。还几乎可以肯定地提出可能具有独立兴趣的趋同结果。当调制矩阵是对逆赫西安的估计时,该算法被证明是微不足道最佳的。为了完整起见,我们提供了实现这一最小差异的实用程序。