Federated Learning (FL) aims to train high-quality models in collaboration with distributed clients while not uploading their local data, which attracts increasing attention in both academia and industry. However, there is still a considerable gap between the flourishing FL research and real-world scenarios, mainly caused by the characteristics of heterogeneous devices and its scales. Most existing works conduct evaluations with homogeneous devices, which are mismatched with the diversity and variability of heterogeneous devices in real-world scenarios. Moreover, it is challenging to conduct research and development at scale with heterogeneous devices due to limited resources and complex software stacks. These two key factors are important yet underexplored in FL research as they directly impact the FL training dynamics and final performance, making the effectiveness and usability of FL algorithms unclear. To bridge the gap, in this paper, we propose an efficient and scalable prototyping system for real-world cross-device FL, FS-Real. It supports heterogeneous device runtime, contains parallelism and robustness enhanced FL server, and provides implementations and extensibility for advanced FL utility features such as personalization, communication compression and asynchronous aggregation. To demonstrate the usability and efficiency of FS-Real, we conduct extensive experiments with various device distributions, quantify and analyze the effect of the heterogeneous device and various scales, and further provide insights and open discussions about real-world FL scenarios. Our system is released to help to pave the way for further real-world FL research and broad applications involving diverse devices and scales.
翻译:联邦学习旨在通过与分布式客户端合作培训高质量模型,同时不上传其本地数据,这在学术界和业界越来越受到关注。然而,实际场景中的异构设备和规模特征仍存在相当大的差距。大多数现有工作都与同质设备匹配进行评估,这与实际场景中异构设备的多样性和可变性不匹配。此外,由于资源有限和软件堆栈复杂,利用异构设备进行规模化研究和开发也是具有挑战性的。这两个关键因素在联邦学习研究中很重要,但是由于它们直接影响联邦学习培训动态和最终性能,使得联邦学习算法的效果和可用性不明确。为了弥合这一差距,本文提出了一个高效可扩展的面向真实跨设备的联邦学习原型系统FS-Real。它支持异构设备运行时,包含并行和鲁棒性增强的联邦学习服务器,并为高级联邦学习实用功能(如个性化,通信压缩和异步聚合)提供实现和扩展性。为了展示FS-Real的可用性和效率,我们进行了各种设备分布的广泛实验,量化和分析了异构设备和各种规模的影响,并进一步提供关于实际联邦学习场景的见解和开放式讨论。我们发布该系统,以帮助铺平进一步实际联邦学习研究和涉及多样设备和规模的广泛应用的道路。