Stochastic optimization algorithms implemented on distributed computing architectures are increasingly used to tackle large-scale machine learning applications. A key bottleneck in such distributed systems is the communication overhead for exchanging information such as stochastic gradients between different workers. Sparse communication with memory and the adaptive aggregation methodology are two successful frameworks among the various techniques proposed to address this issue. In this paper, we creatively exploit the advantages of Sparse communication and Adaptive aggregated Stochastic Gradients to design a communication-efficient distributed algorithm named SASG. Specifically, we first determine the workers that need to communicate based on the adaptive aggregation rule and then sparse this transmitted information. Therefore, our algorithm reduces both the overhead of communication rounds and the number of communication bits in the distributed system. We define an auxiliary sequence and give convergence results of the algorithm with the help of Lyapunov function analysis. Experiments on training deep neural networks show that our algorithm can significantly reduce the number of communication rounds and bits compared to the previous methods, with little or no impact on training and testing accuracy.
翻译:在分布式计算机结构中实施的斯托克优化算法越来越多地被用于解决大规模机器学习应用程序。这种分布式系统中的一个关键瓶颈是不同工人之间交流信息的通信间接费用,如随机梯度等。与记忆和适应性汇总方法的松散通信是建议解决这一问题的各种技术中两个成功的框架。在本文中,我们创造性地利用分散式通信和适应性综合托盘梯子的优势来设计一个名为SASG的高效通信分布算法。具体地说,我们首先确定需要根据适应性汇总规则进行沟通的工人,然后稀释这种传递的信息。因此,我们的算法减少了通信回合的间接费用和分布式系统中通信位数。我们定义了辅助序列,并在Lyapunov函数分析的帮助下提供了算法的趋同结果。对深神经网络的培训实验表明,与以往的方法相比,我们的算法可以大大减少通信圆和位数,对培训和测试准确性影响不大或没有影响。