In this paper, we propose a novel model-parallel learning method, called local critic training, which trains neural networks using additional modules called local critic networks. The main network is divided into several layer groups and each layer group is updated through error gradients estimated by the corresponding local critic network. We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs). In addition, we demonstrate that the proposed method is guaranteed to converge to a critical point. We also show that trained networks by the proposed method can be used for structural optimization. Experimental results show that our method achieves satisfactory performance, reduces training time greatly, and decreases memory consumption per machine. Code is available at https://github.com/hjdw2/Local-critic-training.
翻译:在本文中,我们提出一种新的模型平行学习方法,称为当地评论员培训,该方法利用称为当地评论员网络的额外模块培训神经网络,主要网络分为若干层组,每个层组通过相应的当地评论员网络估计的错误梯度加以更新,我们表明拟议方法成功地分离了横向神经网络(CNNs)和经常性神经网络(RNNs)两个层组的更新过程。此外,我们还表明,所提议方法保证汇合到一个临界点。我们还表明,通过拟议方法培训的网络可用于结构优化。实验结果显示,我们的方法取得了令人满意的业绩,大大减少了培训时间,降低了每台机器的记忆消耗量。代码可在https://github.com/hjdw2/Lial-critic-train中查阅。