Advanced optimization algorithms such as Newton method and AdaGrad benefit from second order derivative or second order statistics to achieve better descent directions and faster convergence rates. At their heart, such algorithms need to compute the inverse or inverse square root of a matrix whose size is quadratic of the dimensionality of the search space. For high dimensional search spaces, the matrix inversion or inversion of square root becomes overwhelming which in turn demands for approximate methods. In this work, we propose a new matrix approximation method which divides a matrix into blocks and represents each block by one or two numbers. The method allows efficient computation of matrix inverse and inverse square root. We apply our method to AdaGrad in training deep neural networks. Experiments show encouraging results compared to the diagonal approximation.
翻译:牛顿法和AdaGrad等高级优化算法从第二顺序衍生或第二顺序统计中受益,以达到更好的下降方向和更快的趋同率。在他们的心脏上,这种算法需要计算一个矩阵的反向或反向平方根,该矩阵的大小是搜索空间的四面形。对于高维搜索空间,矩阵的反向或反向正方根变得压倒性,这反过来又要求采用近似方法。在这项工作中,我们提出了一个新的矩阵近似方法,将一个矩阵分为块,并代表每个区块一两个数字。这种方法可以有效地计算矩阵的反向和反向平方根。我们用我们的方法来训练深神经网络。实验显示,与对角近率相比,结果令人鼓舞。