The decentralized Federated Learning (FL) setting avoids the role of a potentially unreliable or untrustworthy central host by utilizing groups of clients to collaboratively train a model via localized training and model/gradient sharing. Most existing decentralized FL algorithms require synchronization of client models where the speed of synchronization depends upon the slowest client. In this work, we propose SWIFT: a novel wait-free decentralized FL algorithm that allows clients to conduct training at their own speed. Theoretically, we prove that SWIFT matches the gold-standard iteration convergence rate $\mathcal{O}(1/\sqrt{T})$ of parallel stochastic gradient descent for convex and non-convex smooth optimization (total iterations $T$). Furthermore, we provide theoretical results for IID and non-IID settings without any bounded-delay assumption for slow clients which is required by other asynchronous decentralized FL algorithms. Although SWIFT achieves the same iteration convergence rate with respect to $T$ as other state-of-the-art (SOTA) parallel stochastic algorithms, it converges faster with respect to run-time due to its wait-free structure. Our experimental results demonstrate that SWIFT's run-time is reduced due to a large reduction in communication time per epoch, which falls by an order of magnitude compared to synchronous counterparts. Furthermore, SWIFT produces loss levels for image classification, over IID and non-IID data settings, upwards of 50% faster than existing SOTA algorithms.
翻译:分散化的联邦学习(FL)设置避免了潜在不可靠或不可信的核心主机的作用,通过利用客户群体通过本地化培训和模型/梯度共享来合作培训模型。 大部分现有的分散式FL算法需要同步化客户模式, 同步速度取决于最慢的客户。 在这项工作中,我们建议SWIFT: 一个新的无等待的分散式FL算法, 使客户能够以自己的速度开展培训。 理论上, 我们证明 SWIFT 匹配了金标准转接率$\mathcal{O}(1/\qrt{T}) $( 1/\qrt{T} ) 的黄金标准转接合率, 即通过同步化的50级( SOTA) 和非convex 平滑度优化的平行的梯度梯度下降。 此外,我们为 IID 和非 II 设置提供了理论结果, 而对于其他分散式的FL 运算法 来说, SWIFT 的当前递减速度将比自动递减速度更快。