大规模输油管道回推进:培训没有蝙蝠的大模型 (Pipelined Backpropagation at Scale: Training Large Models without Batches)

New hardware can substantially increase the speed and efficiency of deep neural network training. To guide the development of future hardware architectures, it is pertinent to explore the hardware and machine learning properties of alternative training algorithms. In this work we evaluate the use of small batch, fine-grained Pipelined Backpropagation, an asynchronous pipeline parallel training algorithm that has significant hardware advantages. We introduce two methods, Spike Compensation and Linear Weight Prediction, that effectively mitigate the downsides caused by the asynchronicity of Pipelined Backpropagation and outperform existing techniques in our setting. We show that appropriate normalization and small batch sizes can also aid training. With our methods, fine-grained Pipelined Backpropagation using a batch size of one can match the accuracy of SGD for multiple networks trained on CIFAR-10 and ImageNet. Simple scaling rules allow the use of existing hyperparamaters for traditional training without additional tuning.

翻译：新硬件可以大幅提高深神经网络培训的速度和效率。为指导未来硬件结构的发展,有必要探索替代培训算法的硬件和机器学习特性。在这项工作中,我们评估了小批量、精细研磨的管道背面插图的使用情况,这是一个无同步管道平行培训算法,具有巨大的硬件优势。我们引入了两种方法,即斯派克补偿和线形湿度预测,有效缓解由于管道穿透的后方对流和超出我们环境中现有技术的不同步性所造成的下层。我们表明,适当的正常化和小批量规模也可以帮助培训。用我们的方法,使用一批量的精细研磨的管道背面插图,可以与在CIFAR-10和图像网络上培训的多个网络的SGD精确性匹配。简单的缩放规则允许在不作额外调整的情况下将现有的超分式软件用于传统培训。

相关内容

反向传播

关注 355

反向传播一词严格来说仅指用于计算梯度的算法，而不是指如何使用梯度。但是该术语通常被宽松地指整个学习算法，包括如何使用梯度，例如通过随机梯度下降。反向传播将增量计算概括为增量规则中的增量规则，该规则是反向传播的单层版本，然后通过自动微分进行广义化，其中反向传播是反向累积（或“反向模式”）的特例。在机器学习中，反向传播（backprop）是一种广泛用于训练前馈神经网络以进行监督学习的算法。对于其他人工神经网络（ANN）都存在反向传播的一般化–一类算法，通常称为“反向传播”。反向传播算法的工作原理是，通过链规则计算损失函数相对于每个权重的梯度，一次计算一层，从最后一层开始向后迭代，以避免链规则中中间项的冗余计算。

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日