This work proposes a time-efficient Natural Gradient Descent method, called TENGraD, with linear convergence guarantees. Computing the inverse of the neural network's Fisher information matrix is expensive in NGD because the Fisher matrix is large. Approximate NGD methods such as KFAC attempt to improve NGD's running time and practical application by reducing the Fisher matrix inversion cost with approximation. However, the approximations do not reduce the overall time significantly and lead to less accurate parameter updates and loss of curvature information. TENGraD improves the time efficiency of NGD by computing Fisher block inverses with a computationally efficient covariance factorization and reuse method. It computes the inverse of each block exactly using the Woodbury matrix identity to preserve curvature information while admitting (linear) fast convergence rates. Our experiments on image classification tasks for state-of-the-art deep neural architecture on CIFAR-10, CIFAR-100, and Fashion-MNIST show that TENGraD significantly outperforms state-of-the-art NGD methods and often stochastic gradient descent in wall-clock time.
翻译:这项工作提出了一种具有时间效率的自然梯子法,称为TENGRAD,具有线性趋同保证。计算神经网络的渔业信息矩阵的反面在NGD中成本很高,因为Fisher 矩阵很大。KFAC等近似NGD方法试图通过降低Fisher 矩阵反向成本来改善NGD的运行时间和实用应用,但近似方法并没有显著缩短整个时间,导致曲线更新参数和丢失。TENGAD通过以计算效率高的共变系数和再利用方法计算渔业区块反向数据提高NGD的时间效率。它计算每个区完全使用Woodbury矩阵特性来保存曲线信息,同时承认(线性)快速趋同率。我们在CIFAR-10、CIFAR-100和FASAshion-MNIST的高级深神经结构图像分类任务实验显示,ENGAD明显超越了最新NGD方法,而且常常在墙时段梯梯梯梯梯级梯级下降。