SSD-SSD:为分布式深层学习培训提供通信普及 (SSD-SSD: Communication sparsification for distributed deep learning training)

Intensive communication and synchronization cost for gradients and parameters is the well-known bottleneck of distributed deep learning training. Based on the observations that Synchronous SGD (SSGD) obtains good convergence accuracy while asynchronous SGD (ASGD) delivers a faster raw training speed, we propose Several Steps Delay SGD (SSD-SGD) to combine their merits, aiming at tackling the communication bottleneck via communication sparsification. SSD-SGD explores both global synchronous updates in the parameter servers and asynchronous local updates in the workers in each periodic iteration. The periodic and flexible synchronization makes SSD-SGD achieve good convergence accuracy and fast training speed. To the best of our knowledge, we strike the new balance between synchronization quality and communication sparsification, and improve the trade-off between accuracy and training speed. Specifically, the core components of SSD-SGD include proper warm-up stage, steps delay stage, and our novel algorithm of global gradient for local update (GLU). GLU is critical for local update operations to effectively compensate the delayed local weights. Furthermore, we implement SSD-SGD on MXNet framework and comprehensively evaluate its performance with CIFAR-10 and ImageNet datasets. Experimental results show that SSD-SGD can accelerate distributed training speed under different experimental configurations, by up to 110%, while achieving good convergence accuracy.

翻译：根据以下观察,即同步SGD(SSGD)获得良好的趋同准确性和快速培训速度。根据我们所知,我们建议“若干步骤延迟”SGD(SSD-SGD)提供更快的原始培训速度,以综合其优点,目的是通过通信封闭解决通信瓶颈问题。SSD-SGD的核心组成部分包括适当的暖化阶段、步骤延迟阶段和我们全球升级的新算法(GLU)。GLU对于当地更新业务以有效补偿当地延迟的准确性,GSD-SGD(SD-ROD)在SAD(SAD)和SAD(SAD-SSD-SSD)数据库的快速性能框架下,我们实施SSD-SD-SD(SD-SD-SD-SD-SAL-SLD-SD-SD-SD-SLD-MSD-MSD-MSD-SD-MSD-SD-SD-SD-SD-SD-SD-CADADADADADADAD SAD ASU) 和SLSLSLSD(SD-SLD-SD-SD-SD-SALD-SALD-SD-SD-CAD-SD-SDAD-SDADADADADADADAD) ASADADADADADADAD AS AS SAD ASD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD) SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD SAD