In this paper, we develop an efficient sketchy empirical natural gradient method (SENG) for large-scale deep learning problems. The empirical Fisher information matrix is usually low-rank since the sampling is only practical on a small amount of data at each iteration. Although the corresponding natural gradient direction lies in a small subspace, both the computational cost and memory requirement are still not tractable due to the high dimensionality. We design randomized techniques for different neural network structures to resolve these challenges. For layers with a reasonable dimension, sketching can be performed on a regularized least squares subproblem. Otherwise, since the gradient is a vectorization of the product between two matrices, we apply sketching on the low-rank approximations of these matrices to compute the most expensive parts. A distributed version of SENG is also developed for extremely large-scale applications. Global convergence to stationary points is established under some mild assumptions and a fast linear convergence is analyzed under the neural tangent kernel (NTK) case. Extensive experiments on convolutional neural networks show the competitiveness of SENG compared with the state-of-the-art methods. On the task ResNet50 with ImageNet-1k, SENG achieves 75.9\% Top-1 testing accuracy within 41 epochs. Experiments on the distributed large-batch training show that the scaling efficiency is quite reasonable.
翻译:在本文中,我们为大型深层学习问题开发了一种高效的粗略实验性自然梯度方法(SENG ) 。 经验型渔业信息矩阵通常比较低, 因为取样仅仅对每个迭代的少量数据是实用的。 虽然相应的自然梯度方向位于一个小的子空间中, 但计算成本和记忆要求仍然无法通过高维来移动。 我们设计了不同神经网络结构的随机化技术来应对这些挑战。 对于具有合理层面的层, 草图可以在固定的最小正方形子子上进行。 否则, 由于梯度是产品在两个基质之间的矢量化, 我们对这些基质的低端近似值进行素描, 以计算最昂贵的部分。 分布式的SENG也是为了非常大规模的应用而开发出一个分布式的版本。 全球与固定点的趋同点是在一些轻微的假设下建立的, 而在神经红心(NTK) 的情况下, 对快速线趋同进行了分析。 在革命性神经网络上进行的广泛实验, 显示SEng- 1 AS AS- AS- AS AS ASloveal AS AS SAI SAil SAil SAil SAl SAil SAil SAil SAL SAL SAL SAL SAL SAL SAL SAL SAL AS AS MA AS MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA