In this paper, we propose and analyze SQuARM-SGD, a communication-efficient algorithm for decentralized training of large-scale machine learning models over a network. In SQuARM-SGD, each node performs a fixed number of local SGD steps using Nesterov's momentum and then sends sparsified and quantized updates to its neighbors regulated by a locally computable triggering criterion. We provide convergence guarantees of our algorithm for general (non-convex) and convex smooth objectives, which, to the best of our knowledge, is the first theoretical analysis for compressed decentralized SGD with momentum updates. We show that the convergence rate of SQuARM-SGD matches that of vanilla SGD. We empirically show that including momentum updates in SQuARM-SGD can lead to better test performance than the current state-of-the-art which does not consider momentum updates.
翻译:在本文中,我们提出和分析SQARM-SGD,这是在网络上对大型机械学习模型进行分散化培训的一种通信高效算法。在SQARM-SGD中,每个节点使用Nesterov的动力执行固定数量的本地SGD步骤,然后向邻国发送由当地可计算触发标准规范的松散和量化更新。我们为一般(非凝固)和平稳目标的算法提供了趋同保证,据我们所知,这是压缩分散化的SGD的首次理论分析,带有动力更新。我们表明SQARM-SGD的趋同率与香草 SGD的吻合。我们从经验上表明,在SQARM-SGD中包括动力更新,可以比目前不考虑动力更新的艺术状态更好地测试性能。