Federated Learning (FL) is a distributed learning paradigm that empowers edge devices to collaboratively learn a global model leveraging local data. Simulating FL on GPU is essential to expedite FL algorithm prototyping and evaluations. However, current FL frameworks overlook the disparity between algorithm simulation and real-world deployment, which arises from heterogeneous computing capabilities and imbalanced workloads, thus misleading evaluations of new algorithms. Additionally, they lack flexibility and scalability to accommodate resource-constrained clients. In this paper, we present FedHC, a scalable federated learning framework for heterogeneous and resource-constrained clients. FedHC realizes system heterogeneity by allocating a dedicated and constrained GPU resource budget to each client, and also simulates workload heterogeneity in terms of framework-provided runtime. Furthermore, we enhance GPU resource utilization for scalable clients by introducing a dynamic client scheduler, process manager, and resource-sharing mechanism. Our experiments demonstrate that FedHC has the capability to capture the influence of various factors on client execution time. Moreover, despite resource constraints for each client, FedHC achieves state-of-the-art efficiency compared to existing frameworks without limits. When subjecting existing frameworks to the same resource constraints, FedHC achieves a 2.75x speedup. Code has been released on https://github.com/if-lab-repository/FedHC.
翻译:暂无翻译