Recently, the technique of local updates is a powerful tool in centralized settings to improve communication efficiency via periodical communication. For decentralized settings, it is still unclear how to efficiently combine local updates and decentralized communication. In this work, we propose an algorithm named as LD-SGD, which incorporates arbitrary update schemes that alternate between multiple Local updates and multiple Decentralized SGDs, and provide an analytical framework for LD-SGD. Under the framework, we present a sufficient condition to guarantee the convergence. We show that LD-SGD converges to a critical point for a wide range of update schemes when the objective is non-convex and the training data are non-identically independent distributed. Moreover, our framework brings many insights into the design of update schemes for decentralized optimization. As examples, we specify two update schemes and show how they help improve communication efficiency. Specifically, the first scheme alternates the number of local and global update steps. From our analysis, the ratio of the number of local updates to that of decentralized SGD trades off communication and computation. The second scheme is to periodically shrink the length of local updates. We show that the decaying strategy helps improve communication efficiency both theoretically and empirically.
翻译:最近,地方更新技术是中央环境中通过定期通信提高通信效率的有力工具。对于分散化环境,仍然不清楚如何有效地将地方更新与分散化通信结合起来。在这项工作中,我们提议了一个称为LD-SGD的算法,它包含多种地方更新和多分散化的 SGD 之间的任意更新计划,为LD-SGD 提供了一个分析框架。在这个框架内,我们提出了一个足够的条件来保证统一。我们显示,LD-SGD在目标为非混凝土和训练数据不明显独立分布的情况下,会汇合到一系列广泛的更新计划的关键点。此外,我们的框架为设计权力下放优化的更新计划提供了许多见解。作为例子,我们指定了两个更新计划,并展示了它们如何帮助提高通信效率。具体地说,第一个计划替代了地方和全球更新步骤的数量。我们的分析表明,地方更新的数量与分散化的SGD交易在通信和计算上的比例。第二个计划是定期缩短当地更新的时间长度。我们表明,腐蚀的战略有助于提高理论上和实践中的沟通效率。