A central question in federated learning (FL) is how to design optimization algorithms that minimize the communication cost of training a model over heterogeneous data distributed across many clients. A popular technique for reducing communication is the use of local steps, where clients take multiple optimization steps over local data before communicating with the server (e.g., FedAvg, SCAFFOLD). This contrasts with centralized methods, where clients take one optimization step per communication round (e.g., Minibatch SGD). A recent lower bound on the communication complexity of first-order methods shows that centralized methods are optimal over highly-heterogeneous data, whereas local methods are optimal over purely homogeneous data [Woodworth et al., 2020]. For intermediate heterogeneity levels, no algorithm is known to match the lower bound. In this paper, we propose a multistage optimization scheme that nearly matches the lower bound across all heterogeneity levels. The idea is to first run a local method up to a heterogeneity-induced error floor; next, we switch to a centralized method for the remaining steps. Our analysis may help explain empirically-successful stepsize decay methods in FL [Charles et al., 2020; Reddi et al., 2020]. We demonstrate the scheme's practical utility in image classification tasks.
翻译:联合学习(FL)的一个中心问题是,如何设计优化算法,最大限度地减少培训模型的通信成本,使其与许多客户分布的不同数据相比,培训模式的通信成本最小化。减少通信的流行技术是使用本地步骤,客户在与服务器(例如FedAvg、SCAFFFOLD)沟通之前对本地数据采取多重优化步骤。这与集中方法形成对比,客户每一轮通信采取一个优化步骤(例如Minibatch SGD),最近对一级方法通信复杂性的较低约束显示,集中方法比高度偏差的数据最理想,而地方方法则比纯同质数据[Woodworth等人,2020年]的最佳。对于中间异质级别而言,没有已知的算法可以匹配较低约束的当地数据。在本文件中,我们提议了一个多阶段优化计划,客户每回合采取一个更低约束的跨度(例如Minibatch SGDD) 。设想首先将本地方法运行到异质导致的错误底层;接下来,我们将其余步骤转换为集中方法[Woodworth等人等人等人等人等人等人等人等人等人,2020年 和Redalcessalchegration]。我们的分析可能解释2020年的实际-Lassulatealphisoldaldalx。