FFAT:一个高性能和通信效率的联邦学习系统,使用非同步梯度 (FedAT: A High-Performance and Communication-Efficient Federated Learning System with Asynchronous Tiers)

Federated learning (FL) involves training a model over massive distributed devices, while keeping the training data localized. This form of collaborative learning exposes new tradeoffs among model convergence speed, model accuracy, balance across clients, and communication cost, with new challenges including: (1) straggler problem, where the clients lag due to data or (computing and network) resource heterogeneity, and (2) communication bottleneck, where a large number of clients communicate their local updates to a central server and bottleneck the server. Many existing FL methods focus on optimizing along only one dimension of the tradeoff space. Existing solutions use asynchronous model updating or tiering-based synchronous mechanisms to tackle the straggler problem. However, the asynchronous methods can easily create a network communication bottleneck, while tiering may introduce biases as tiering favors faster tiers with shorter response latencies. To address these issues, we present FedAT, a novel Federated learning method with Asynchronous Tiers under Non-i.i.d. data. FedAT synergistically combines synchronous intra-tier training and asynchronous cross-tier training. By bridging the synchronous and asynchronous training through tiering, FedAT minimizes the straggler effect with improved convergence speed and test accuracy. FedAT uses a straggler-aware, weighted aggregation heuristic to steer and balance the training for further accuracy improvement. FedAT compresses the uplink and downlink communications using an efficient, polyline-encoding-based compression algorithm, therefore minimizing the communication cost. Results show that FedAT improves the prediction performance by up to 21.09%, and reduces the communication cost by up to 8.5x, compared to state-of-the-art FL methods.

翻译：联邦学习(FL) 涉及在大规模分布装置上培训一个模型,同时保持培训数据本地化。这种协作学习形式暴露了模型趋同速度、模型准确性、客户间平衡和通信成本之间的新平衡,并带来了新的挑战,包括:(1) 分流问题,客户因数据或(计算和网络)资源差异而滞后,以及(2) 通信瓶颈,大量客户向中央服务器通报其本地更新情况,并堵塞服务器。许多现有的FL方法只侧重于优化交易空间的一个层面。现有解决方案使用不同步的模型更新或基于分层的同步机制来解决斯特拉格勒问题。然而,不同步的方法很容易造成网络通信瓶颈,而分层则可能引入偏差,将更快速的偏差与更短的反应迟缓。为了解决这些问题,我们介绍了FedAT(FedAT),一种与Asyncront-Syrontical Steil-noralal-realalalalityality-trading laffirmal-trainal-trainal-trainal-trainal-trainal-trading lafferal-deal-deal-tracal-deal-tracal-traceal-tracal-deal-tracaltracal-tracal-tracal-tracal-deal-deal-tracal-tragal-tragal-deal-deal-tragal-traction-toxal-al-toxal-dal- d d d d d d d d ddal-dal-dal-dal-d-d-dal-dal-d-dal-dddal-d-dal-d-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-toal-toal-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-to-toal-toal-toal-toal-toal-toal-toal-toal-inal-toal-

相关内容

联邦学习

关注 199

联邦学习（Federated Learning）是一种新兴的人工智能基础技术，在 2016 年由谷歌最先提出，原本用于解决安卓手机终端用户在本地更新模型的问题，其设计目标是在保障大数据交换时的信息安全、保护终端数据和个人数据隐私、保证合法合规的前提下，在多参与方或多计算结点之间开展高效率的机器学习。其中，联邦学习可使用的机器学习算法不局限于神经网络，还包括随机森林等重要算法。联邦学习有望成为下一代人工智能协同算法和协作网络的基础。

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

65+阅读 · 2021年4月24日