Federated learning (FL) involves training a model over massive distributed devices, while keeping the training data localized. This form of collaborative learning exposes new tradeoffs among model convergence speed, model accuracy, balance across clients, and communication cost, with new challenges including: (1) straggler problem, where the clients lag due to data or (computing and network) resource heterogeneity, and (2) communication bottleneck, where a large number of clients communicate their local updates to a central server and bottleneck the server. Many existing FL methods focus on optimizing along only one dimension of the tradeoff space. Existing solutions use asynchronous model updating or tiering-based synchronous mechanisms to tackle the straggler problem. However, the asynchronous methods can easily create a network communication bottleneck, while tiering may introduce biases as tiering favors faster tiers with shorter response latencies. To address these issues, we present FedAT, a novel Federated learning method with Asynchronous Tiers under Non-i.i.d. data. FedAT synergistically combines synchronous intra-tier training and asynchronous cross-tier training. By bridging the synchronous and asynchronous training through tiering, FedAT minimizes the straggler effect with improved convergence speed and test accuracy. FedAT uses a straggler-aware, weighted aggregation heuristic to steer and balance the training for further accuracy improvement. FedAT compresses the uplink and downlink communications using an efficient, polyline-encoding-based compression algorithm, therefore minimizing the communication cost. Results show that FedAT improves the prediction performance by up to 21.09%, and reduces the communication cost by up to 8.5x, compared to state-of-the-art FL methods.
翻译:联邦学习(FL) 涉及在大规模分布装置上培训一个模型,同时保持培训数据本地化。这种协作学习形式暴露了模型趋同速度、模型准确性、客户间平衡和通信成本之间的新平衡,并带来了新的挑战,包括:(1) 分流问题,客户因数据或(计算和网络)资源差异而滞后,以及(2) 通信瓶颈,大量客户向中央服务器通报其本地更新情况,并堵塞服务器。许多现有的FL方法只侧重于优化交易空间的一个层面。现有解决方案使用不同步的模型更新或基于分层的同步机制来解决斯特拉格勒问题。然而,不同步的方法很容易造成网络通信瓶颈,而分层则可能引入偏差,将更快速的偏差与更短的反应迟缓。为了解决这些问题,我们介绍了FedAT(FedAT),一种与Asyncront-Syrontical Steil-noralal-realalalalityality-trading laffirmal-trainal-trainal-trainal-trainal-trainal-trading lafferal-deal-deal-tracal-deal-tracal-traceal-tracal-deal-tracaltracal-tracal-tracal-tracal-deal-deal-tracal-tragal-tragal-deal-deal-tragal-traction-toxal-al-toxal-dal- d d d d d d d d ddal-dal-dal-dal-d-d-dal-dal-d-dal-dddal-d-dal-d-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-toal-toal-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-toal-toal-toal-toal-toal-toal-toal-toal-inal-toal-