Federated Averaging (FedAvg) has emerged as the algorithm of choice for federated learning due to its simplicity and low communication cost. However, in spite of recent research efforts, its performance is not fully understood. We obtain tight convergence rates for FedAvg and prove that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow convergence. As a solution, we propose a new algorithm (SCAFFOLD) which uses control variates (variance reduction) to correct for the `client-drift' in its local updates. We prove that SCAFFOLD requires significantly fewer communication rounds and is not affected by data heterogeneity or client sampling. Further, we show that (for quadratics) SCAFFOLD can take advantage of similarity in the client's data yielding even faster convergence. The latter is the first result to quantify the usefulness of local-steps in distributed optimization.
翻译:联邦融合(FedAvg)由于其简单和通信成本低,已成为联邦学习的首选算法,然而,尽管最近进行了一些研究,但其性能并没有得到完全理解。我们获得了FedAvg的紧密趋同率,并证明当数据(非二d)存在差异时,它会受到“客户驱动”的影响,从而导致不稳定和缓慢的趋同。作为一种解决办法,我们提议一种新的算法(SCAFFOLD),使用控制变量(变量减少)来纠正“客户驱动”在当地更新中的“客户驱动”功能。我们证明,SCAFFFOLD需要的通信周期要少得多,而且不受数据差异性或客户抽样的影响。此外,我们表明(对于四面形体)SCAFFFOLD可以利用客户数据中的相似性,从而更快的趋同。后者是量化分布优化方面地方步骤的效用的第一个结果。