Federated learning (FL) aims to minimize the communication complexity of training a model over heterogeneous data distributed across many clients. A common approach is local methods, where clients take multiple optimization steps over local data before communicating with the server (e.g., FedAvg). Local methods can exploit similarity between clients' data. However, in existing analyses, this comes at the cost of slow convergence in terms of the dependence on the number of communication rounds R. On the other hand, global methods, where clients simply return a gradient vector in each round (e.g., SGD), converge faster in terms of R but fail to exploit the similarity between clients even when clients are homogeneous. We propose FedChain, an algorithmic framework that combines the strengths of local methods and global methods to achieve fast convergence in terms of R while leveraging the similarity between clients. Using FedChain, we instantiate algorithms that improve upon previously known rates in the general convex and PL settings, and are near-optimal (via an algorithm-independent lower bound that we show) for problems that satisfy strong convexity. Empirical results support this theoretical gain over existing methods.
翻译:联邦学习(FL)旨在最小化对分布在许多客户端上的异构数据进行模型训练的通信复杂度。其中一种常见的方法是局部方法,即客户端在与服务器进行通信之前,在本地数据上执行多个优化步骤(例如FedAvg)。局部方法可以利用客户之间的数据相似性。然而,在现有的分析中,它以依赖于通信轮数R的收敛速度慢为代价来实现此目标。另一方面,全局方法则直接在每个周期(例如SGD)中返回渐变向量,以R为单位更快地收敛,但即使客户端是同质化的,也无法利用客户之间的相似性。我们提出了FedChain,这是一种算法框架,结合了局部方法和全局方法的优点,以在利用客户之间的相似性的同时实现在R方面的快速收敛。使用FedChain,我们实现的算法在一般凸和PL设置中提高了以前已知的速度,并且在满足强凸性的问题中接近最优(通过我们展示的算法独立的下界)。实证结果支持与现有方法相比的理论增益。