FFML 鹦鹉:通过有异性、有不同意识的序列和等级培训安排,建立可扩展的联邦学习系统</s> (FedML Parrot: A Scalable Federated Learning System via Heterogeneity-aware Scheduling on Sequential and Hierarchical Training)

Federated Learning (FL) enables collaborations among clients for train machine learning models while protecting their data privacy. Existing FL simulation platforms that are designed from the perspectives of traditional distributed training, suffer from laborious code migration between simulation and production, low efficiency, low GPU utility, low scalability with high hardware requirements and difficulty of simulating stateful clients. In this work, we firstly demystify the challenges and bottlenecks of simulating FL, and design a new FL system named as FedML \texttt{Parrot}. It improves the training efficiency, remarkably relaxes the requirements on the hardware, and supports efficient large-scale FL experiments with stateful clients by: (1) sequential training clients on devices; (2) decomposing original aggregation into local and global aggregation on devices and server respectively; (3) scheduling tasks to mitigate straggler problems and enhance computing utility; (4) distributed client state manager to support various FL algorithms. Besides, built upon our generic APIs and communication interfaces, users can seamlessly transform the simulation into the real-world deployment without modifying codes. We evaluate \texttt{Parrot} through extensive experiments for training diverse models on various FL datasets to demonstrate that \texttt{Parrot} can achieve simulating over 1000 clients (stateful or stateless) with flexible GPU devices setting ($4 \sim 32$) and high GPU utility, 1.2 $\sim$ 4 times faster than FedScale, and 10 $\sim$ 100 times memory saving than FedML. And we verify that \texttt{Parrot} works well with homogeneous and heterogeneous devices in three different clusters. Two FL algorithms with stateful clients and four algorithms with stateless clients are simulated to verify the wide adaptability of \texttt{Parrot} to different algorithms.

翻译：联邦学习( FL) 能够让客户在培训机器学习模型方面合作,同时保护他们的数据隐私。从传统分布式培训角度设计的现有的 FL 模拟平台能够提高培训效率,大大放宽硬件要求,支持与有声客户的高效大型 FL 实验,具体做法是:(1) 设备上连续培训客户;(2) 将原始集成分解成本地和全球组合;(3) 安排任务以缓解硬质问题和提高计算功能;(4) 分发客户状态管理器以支持各种 FL 算法。此外,根据我们通用ADML $( texttt) 和通信界面,用户可以在不修改代码的情况下将模拟转换成真实世界。我们评估了硬件要求,支持与有色客户的高效大型 FL 大规模交易;(2) 将原始集成本地和全球组合分别在设备和服务器上进行;(3) 安排任务来缓解压力问题,提高计算功能;(4) 分发客户状态管理器支持各种FLL 算法。此外, 以普通的 AL $ 美元和通信界面为基础,用户可以在不修改代码代码上进行模拟。</s>