Federated learning (FL) is a prevailing distributed learning paradigm, where a large number of workers jointly learn a model without sharing their training data. However, high communication costs could arise in FL due to large-scale (deep) learning models and bandwidth-constrained connections. In this paper, we introduce a communication-efficient algorithmic framework called CFedAvg for FL with non-i.i.d. datasets, which works with general (biased or unbiased) SNR-constrained compressors. We analyze the convergence rate of CFedAvg for non-convex functions with constant and decaying learning rates. The CFedAvg algorithm can achieve an $\mathcal{O}(1 / \sqrt{mKT} + 1 / T)$ convergence rate with a constant learning rate, implying a linear speedup for convergence as the number of workers increases, where $K$ is the number of local steps, $T$ is the number of total communication rounds, and $m$ is the total worker number. This matches the convergence rate of distributed/federated learning without compression, thus achieving high communication efficiency while not sacrificing learning accuracy in FL. Furthermore, we extend CFedAvg to cases with heterogeneous local steps, which allows different workers to perform a different number of local steps to better adapt to their own circumstances. The interesting observation in general is that the noise/variance introduced by compressors does not affect the overall convergence rate order for non-i.i.d. FL. We verify the effectiveness of our CFedAvg algorithm on three datasets with two gradient compression schemes of different compression ratios.
翻译:联邦学习(FL)是一个普遍的分布式学习模式,大量工人在不分享培训数据的情况下联合学习一个模型,然而,由于大型(深)学习模式和带宽限制连接,FL的通信成本可能很高。在本文中,我们引入了一个通信高效算法框架,称为FL的CFedAvg与非i.i.d.d.数据集,它与一般(有偏见或没有偏见)SNR限制的压缩机一起工作。我们分析了非convex功能的CFedAvg趋同率的趋同率,这种非CFEedAvg的趋同率与不变和不断下降的学习率可能很高。CFAedAvg的算法可以实现$(1/\qrt{mKT}+1/T)的汇合率,它意味着随着工人人数的增加而逐渐趋同,美元是当地步骤的数目,美元是整个通信周期的数目,美元是总汇数。 美元是总工人的总数。这个CFleval-foral dal dal dal dal dal dal 的计算方法与我们进行不具有更精确的传播/FFleval realdelexxxx 学习不同的情况,我们进行不进行不同的学习, 。