Stochastic optimization algorithms implemented on distributed computing architectures are increasingly used to tackle large-scale machine learning applications. A key bottleneck in such distributed systems is the communication overhead for exchanging information such as stochastic gradients between different workers. Sparse communication with memory and the adaptive aggregation methodology are two successful frameworks among the various techniques proposed to address this issue. In this paper, we exploit the advantages of Sparse communication and Adaptive aggregated Stochastic Gradients to design a communication-efficient distributed algorithm named SASG. Specifically, we determine the workers who need to communicate with the parameter server based on the adaptive aggregation rule and then sparsify the transmitted information. Therefore, our algorithm reduces both the overhead of communication rounds and the number of communication bits in the distributed system. We define an auxiliary sequence and provide convergence results of the algorithm with the help of Lyapunov function analysis. Experiments on training deep neural networks show that our algorithm can significantly reduce the communication overhead compared to the previous methods, with little impact on training and testing accuracy.
翻译:在分布式计算机结构中实施的斯托克优化算法越来越多地被用于解决大规模机器学习应用程序。这种分布式系统中的一个关键瓶颈是不同工人之间交流信息的通信间接费用,例如随机梯度。与记忆和适应性汇总方法的不均匀的通信是解决这一问题的各种拟议技术中两个成功的框架。在本文件中,我们利用分散式通信和适应性综合托盘梯子的优势来设计一个名为SASG的通信高效分布式算法。具体地说,我们确定需要根据适应性汇总规则与参数服务器进行通信的工人,然后对传输的信息进行密封。因此,我们的算法减少了通信轮的间接费用和分布式系统中的通信位数。我们定义了一个辅助序列,并在Lyapunov函数分析的帮助下提供了算法的趋同结果。关于培训深度神经网络的实验表明,与以前的方法相比,我们的算法可以大大减少通信间接费用,对培训和测试准确性影响不大。