A commonly cited inefficiency of neural network training using back-propagation is the update locking problem: each layer must wait for the signal to propagate through the full network before updating. Several alternatives that can alleviate this issue have been proposed. In this context, we consider a simple alternative based on minimal feedback, which we call Decoupled Greedy Learning (DGL). It is based on a classic greedy relaxation of the joint training objective, recently shown to be effective in the context of Convolutional Neural Networks (CNNs) on large-scale image classification. We consider an optimization of this objective that permits us to decouple the layer training, allowing for layers or modules in networks to be trained with a potentially linear parallelization. With the use of a replay buffer we show that this approach can be extended to asynchronous settings, where modules can operate and continue to update with possibly large communication delays. To address bandwidth and memory issues we propose an approach based on online vector quantization. This allows to drastically reduce the communication bandwidth between modules and required memory for replay buffers. We show theoretically and empirically that this approach converges and compare it to the sequential solvers. We demonstrate the effectiveness of DGL against alternative approaches on the CIFAR-10 dataset and on the large-scale ImageNet dataset.
翻译:通常提到的使用反向分析的神经网络培训效率低下是更新的锁定问题:每个层都必须等待信号通过整个网络传播,然后更新。提出了若干可以缓解这一问题的替代方案。在这方面,我们考虑一个基于微小反馈的简单备选方案,我们称之为“分解贪婪学习(DGL) ” 。它基于对联合培训目标的典型贪婪放松,最近显示在大规模图像分类方面,在演动神经网络(CNNs)中是有效的。我们考虑优化这一目标,使我们能够分解层培训,允许网络中的层或模块接受潜在的线性平行化培训。我们利用一个重新播放的缓冲显示,这一方法可以扩展至无序环境,模块可以运行并随着可能的大规模通信延误不断更新。为了解决带宽和记忆问题,我们建议了一种基于在线矢量定量化的方法。这样可以极大地减少模块和需要记忆的模块和缓冲记忆之间的通信带宽度。我们从理论上和实验上展示了这一方法,在选择中可以将FAR- 10 网络的层次数据与连续数据对比。