Intensive communication and synchronization cost for gradients and parameters is the well-known bottleneck of distributed deep learning training. Based on the observations that Synchronous SGD (SSGD) obtains good convergence accuracy while asynchronous SGD (ASGD) delivers a faster raw training speed, we propose Several Steps Delay SGD (SSD-SGD) to combine their merits, aiming at tackling the communication bottleneck via communication sparsification. SSD-SGD explores both global synchronous updates in the parameter servers and asynchronous local updates in the workers in each periodic iteration. The periodic and flexible synchronization makes SSD-SGD achieve good convergence accuracy and fast training speed. To the best of our knowledge, we strike the new balance between synchronization quality and communication sparsification, and improve the trade-off between accuracy and training speed. Specifically, the core components of SSD-SGD include proper warm-up stage, steps delay stage, and our novel algorithm of global gradient for local update (GLU). GLU is critical for local update operations to effectively compensate the delayed local weights. Furthermore, we implement SSD-SGD on MXNet framework and comprehensively evaluate its performance with CIFAR-10 and ImageNet datasets. Experimental results show that SSD-SGD can accelerate distributed training speed under different experimental configurations, by up to 110%, while achieving good convergence accuracy.


翻译:根据以下观察,即同步SGD(SSGD)获得良好的趋同准确性和快速培训速度。根据我们所知,我们建议“若干步骤延迟”SGD(SSD-SGD)提供更快的原始培训速度,以综合其优点,目的是通过通信封闭解决通信瓶颈问题。SSD-SGD的核心组成部分包括适当的暖化阶段、步骤延迟阶段和我们全球升级的新算法(GLU)。GLU对于当地更新业务以有效补偿当地延迟的准确性,GSD-SGD(SD-ROD)在SAD(SAD)和SAD(SAD-SSD-SSD)数据库的快速性能框架下,我们实施SSD-SD-SD(SD-SD-SD-SD-SAL-SLD-SD-SD-SD-SLD-MSD-MSD-MSD-SD-MSD-SD-SD-SD-SD-SD-SD-CADADADADADADAD SAD ASU) 和SLSLSLSD(SD-SLD-SD-SD-SD-SALD-SALD-SD-SD-CAD-SD-SDAD-SDADADADADADADAD) ASADADADADADADAD AS AS SAD ASD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD) SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD

1
下载
关闭预览

相关内容

专知会员服务
44+阅读 · 2020年10月31日
【MIT】反偏差对比学习,Debiased Contrastive Learning
专知会员服务
90+阅读 · 2020年7月4日
Python分布式计算,171页pdf,Distributed Computing with Python
专知会员服务
107+阅读 · 2020年5月3日
【google】监督对比学习,Supervised Contrastive Learning
专知会员服务
31+阅读 · 2020年4月23日
Hierarchically Structured Meta-learning
CreateAMind
26+阅读 · 2019年5月22日
Transferring Knowledge across Learning Processes
CreateAMind
28+阅读 · 2019年5月18日
Unsupervised Learning via Meta-Learning
CreateAMind
42+阅读 · 2019年1月3日
meta learning 17年:MAML SNAIL
CreateAMind
11+阅读 · 2019年1月2日
Disentangled的假设的探讨
CreateAMind
9+阅读 · 2018年12月10日
条件GAN重大改进!cGANs with Projection Discriminator
CreateAMind
8+阅读 · 2018年2月7日
分布式TensorFlow入门指南
机器学习研究会
4+阅读 · 2017年11月28日
Auto-Encoding GAN
CreateAMind
7+阅读 · 2017年8月4日
Arxiv
10+阅读 · 2021年3月30日
Arxiv
8+阅读 · 2018年6月19日
VIP会员
相关资讯
Top
微信扫码咨询专知VIP会员