Federated Learning (FL) is a newly emerged decentralized machine learning (ML) framework that combines on-device local training with server-based model synchronization to train a centralized ML model over distributed nodes. In this paper, we propose an asynchronous FL framework with periodic aggregation to eliminate the straggler issue in FL systems. For the proposed model, we investigate several device scheduling and update aggregation policies and compare their performances when the devices have heterogeneous computation capabilities and training data distributions. From the simulation results, we conclude that the scheduling and aggregation design for asynchronous FL can be rather different from the synchronous case. For example, a norm-based significance-aware scheduling policy might not be efficient in an asynchronous FL setting, and an appropriate "age-aware" weighting design for the model aggregation can greatly improve the learning performance of such systems.
翻译:联邦学习联合会(FL)是一个新兴的分散式机器学习框架,它把基于服务器的本地设备培训与基于服务器的模式同步结合起来,在分布式节点上培训一个集中式ML模型。在本文中,我们提议一个非同步式FL框架,定期汇总以消除FL系统中的排缩问题。对于拟议模式,我们调查几个装置的时间安排和更新汇总政策,并在设备具有不同计算能力和培训数据分布时比较其性能。从模拟结果来看,我们的结论是,非同步式FL的时间安排和汇总设计可能与同步式的情况相当不同。例如,基于规范的有重要意义的排期政策在非同步式FL设置中可能效率不高,为模型汇总设计一个适当的“有年龄意识”加权可大大改进这些系统的学习性能。