Natural gradient descent (NGD) provided deep insights and powerful tools to deep neural networks. However the computation of Fisher information matrix becomes more and more difficult as the network structure turns large and complex. This paper proposes a new optimization method whose main idea is to accurately replace the natural gradient optimization by reconstructing the network. More specifically, we reconstruct the structure of the deep neural network, and optimize the new network using traditional gradient descent (GD). The reconstructed network achieves the effect of the optimization way with natural gradient descent. Experimental results show that our optimization method can accelerate the convergence of deep network models and achieve better performance than GD while sharing its computational simplicity.
翻译:自然梯度下降(NGD)为深层神经网络提供了深刻的洞察力和强大的工具。然而,随着网络结构的变大和复杂,Fisher信息矩阵的计算变得越来越困难。本文提出了一种新的优化方法,其主要想法是通过重建网络来准确取代自然梯度优化。更具体地说,我们重建深层神经网络的结构,利用传统梯度下降(GD)优化新网络。重建后的网络实现了优化方式与自然梯度下降的效果。实验结果显示,我们的优化方法可以加快深层网络模型的融合,并在共享计算简单性的同时实现比GD更好的绩效。