New hardware can substantially increase the speed and efficiency of deep neural network training. To guide the development of future hardware architectures, it is pertinent to explore the hardware and machine learning properties of alternative training algorithms. In this work we evaluate the use of small batch, fine-grained Pipelined Backpropagation, an asynchronous pipeline parallel training algorithm that has significant hardware advantages. We introduce two methods, Spike Compensation and Linear Weight Prediction, that effectively mitigate the downsides caused by the asynchronicity of Pipelined Backpropagation and outperform existing techniques in our setting. We show that appropriate normalization and small batch sizes can also aid training. With our methods, fine-grained Pipelined Backpropagation using a batch size of one can match the accuracy of SGD for multiple networks trained on CIFAR-10 and ImageNet. Simple scaling rules allow the use of existing hyperparamaters for traditional training without additional tuning.
翻译:新硬件可以大幅提高深神经网络培训的速度和效率。为指导未来硬件结构的发展,有必要探索替代培训算法的硬件和机器学习特性。在这项工作中,我们评估了小批量、精细研磨的管道背面插图的使用情况,这是一个无同步管道平行培训算法,具有巨大的硬件优势。我们引入了两种方法,即斯派克补偿和线形湿度预测,有效缓解由于管道穿透的后方对流和超出我们环境中现有技术的不同步性所造成的下层。我们表明,适当的正常化和小批量规模也可以帮助培训。用我们的方法,使用一批量的精细研磨的管道背面插图,可以与在CIFAR-10和图像网络上培训的多个网络的SGD精确性匹配。简单的缩放规则允许在不作额外调整的情况下将现有的超分式软件用于传统培训。