Distributed training is an effective way to accelerate the training process of large-scale deep learning models. However, the parameter exchange and synchronization of distributed stochastic gradient descent introduce a large amount of communication overhead. Gradient compression is an effective method to reduce communication overhead. In synchronization SGD compression methods, many Top-k sparsification based gradient compression methods have been proposed to reduce the communication. However, the centralized method based on the parameter servers has the single point of failure problem and limited scalability, while the decentralized method with global parameter exchanging may reduce the convergence rate of training. In contrast with Top-$k$ based methods, we proposed a gradient compression method with globe gradient vector sketching, which uses the Count-Sketch structure to store the gradients to reduce the loss of the accuracy in the training process, named global-sketching SGD (gs-SGD). The gs-SGD has better convergence efficiency on deep learning models and a communication complexity of O($\log d*\log P$), where $d$ is the number of model parameters and P is the number of workers. We conducted experiments on GPU clusters to verify that our method has better convergence efficiency than global Top-$k$ and Sketching-based methods. In addition, gs-SGD achieves 1.3-3.1x higher throughput compared with gTop-$k$, and 1.1-1.2x higher throughput compared with original Sketched-SGD.
翻译:分散培训是加快大规模深层学习模式培训进程的有效途径,然而,分布式随机梯度梯度下降的参数交换和同步化引入了大量通信管理费。渐进压缩是减少通信管理费的有效方法。在同步 SGD压缩方法中,提出了许多基于顶点的斜度压缩方法以减少通信量。然而,基于参数服务器的中央方法有一个失败问题和可缩放性的单一点,而全球参数交换的分散化方法可能会降低培训的趋同率。与以最高至美元为基础的方法相比,我们提出了一种使用全球梯度矢量草图绘制的梯度压缩法,该方法使用计数-标准结构储存梯度,以减少培训过程中的准确性损失,称为全球刀切SGD(g-SGD)。GSD在深层学习模型和以O($\log d ⁇ plus P$为基础的通信复杂性方面,以美元为基础的模式参数和P值是以美元为基础的工人数量。我们用GPO-SG3.1的原始组合比GP-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-S-S-S-S-S-S-S-S-G-G-S-G-G-G-G-G-G-G-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-G-G-G-G-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-G-G-S-S-G-G-G-S-S-G-G-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-