Cross-device Federated Learning (FL) is a distributed learning paradigm with several challenges that differentiate it from traditional distributed learning, variability in the system characteristics on each device, and millions of clients coordinating with a central server being primary ones. Most FL systems described in the literature are synchronous - they perform a synchronized aggregation of model updates from individual clients. Scaling synchronous FL is challenging since increasing the number of clients training in parallel leads to diminishing returns in training speed, analogous to large-batch training. Moreover, stragglers hinder synchronous FL training. In this work, we outline a production asynchronous FL system design. Our work tackles the aforementioned issues, sketches of some of the system design challenges and their solutions, and touches upon principles that emerged from building a production FL system for millions of clients. Empirically, we demonstrate that asynchronous FL converges faster than synchronous FL when training across nearly one hundred million devices. In particular, in high concurrency settings, asynchronous FL is 5x faster and has nearly 8x less communication overhead than synchronous FL.
翻译:跨联邦学习(FL)是一种分布式的学习模式,它与传统分布式学习不同,每个设备系统特性的变异性,以及数百万客户与中央服务器协调是主要服务器,文献中描述的大多数FL系统是同步的,它们同步地汇总了个别客户的模型更新。 Slap 同步FL具有挑战性,因为同步的FL同时增加客户培训的数量会减少培训速度的回报,类似于大型批量培训。此外,挤压器会阻碍同步FL培训。在这项工作中,我们概述了一种不同步的FL系统设计。我们的工作解决了上述问题,绘制了一些系统设计挑战及其解决方案的草图,并触及了为数百万客户建立FL生产系统时产生的原则。我们很生动地表明,在近1亿个设备的培训中,不同步的FL比同步的FL同步速度要快。特别是在高通货币环境中,由于同步FL速度快5x快,通信距离近8x高。