Decentralization is a promising method of scaling up parallel machine learning systems. In this paper, we provide a tight lower bound on the iteration complexity for such methods in a stochastic non-convex setting. Our lower bound reveals a theoretical gap in known convergence rates of many existing decentralized training algorithms, such as D-PSGD. We prove by construction this lower bound is tight and achievable. Motivated by our insights, we further propose DeTAG, a practical gossip-style decentralized algorithm that achieves the lower bound with only a logarithm gap. Empirically, we compare DeTAG with other decentralized algorithms on image classification tasks, and we show DeTAG enjoys faster convergence compared to baselines, especially on unshuffled data and in sparse networks.
翻译:分散管理是扩大平行机器学习系统的一个很有希望的方法。 在本文中,我们对在随机的非阴道设置中这类方法的迭代复杂性提供了严格较低的限制。 我们的较低约束显示,许多现有的分散化培训算法(如D-PSGD)在已知的趋同率方面存在理论差距。 我们通过构建这一较低约束证明是紧密和可实现的。 根据我们的洞察力,我们进一步提议DTAG(一种实用的八卦式分散化算法),它只使用对数差距来达到较低界限。 典型地说,我们将DeTAG(DeTAG)与其他分散化算法在图像分类任务上进行比较,我们显示DeTAG(DeTAG)比基线(特别是未完成的数据和分散的网络)的趋同速度更快。