The training efficiency of complex deep learning models can be significantly improved through the use of distributed optimization. However, this process is often hindered by a large amount of communication cost between workers and a parameter server during iterations. To address this bottleneck, in this paper, we present a new communication-efficient algorithm that offers the synergistic benefits of both sparsification and sign quantization, called ${\sf S}^3$GD-MV. The workers in ${\sf S}^3$GD-MV select the top-$K$ magnitude components of their local gradient vector and only send the signs of these components to the server. The server then aggregates the signs and returns the results via a majority vote rule. Our analysis shows that, under certain mild conditions, ${\sf S}^3$GD-MV can converge at the same rate as signSGD while significantly reducing communication costs, if the sparsification parameter $K$ is properly chosen based on the number of workers and the size of the deep learning model. Experimental results using both independent and identically distributed (IID) and non-IID datasets demonstrate that the ${\sf S}^3$GD-MV attains higher accuracy than signSGD, significantly reducing communication costs. These findings highlight the potential of ${\sf S}^3$GD-MV as a promising solution for communication-efficient distributed optimization in deep learning.
翻译:利用分布式优化,可以大大提高复杂深层次学习模式的培训效率。然而,这一过程往往受到工人和参数服务器在迭代期间大量通信成本的阻碍。为了解决这一瓶颈问题,我们在本文中提出了一个新的通信效率算法,它提供封闭和签名量化的协同效益,称为美元S ⁇ 3美元GD-MV。如果根据工人人数和深层次学习模式的规模适当选择松绑参数$K美元,则以S ⁇ 3GD-MV为单位的工人选择其本地梯度矢量的顶级部分,只将这些部件的标志发送到服务器。服务器然后汇总信号,然后通过多数选票规则返回结果。我们的分析表明,在某些温和的条件下,美元S ⁇ 3GD-MV可以与信号SGD相同,同时大幅降低通信成本。如果根据工人人数和深度学习模式的大小适当选择成本,则使用独立和相同的分布式(IID)和非IID数据集的实验结果。服务器然后通过多数投票规则将信号汇总并返回结果。我们的分析表明,在某些温和标准下,美元S ⁇ 3GD-MVD能够大大降低S&GMV的准确度的通信成本。