In cross-device Federated Learning (FL) environments, scaling synchronous FL methods is challenging as stragglers hinder the training process. Moreover, the availability of each client to join the training is highly variable over time due to system heterogeneities and intermittent connectivity. Recent asynchronous FL methods (e.g., FedBuff) have been proposed to overcome these issues by allowing slower users to continue their work on local training based on stale models and to contribute to aggregation when ready. However, we show empirically that this method can lead to a substantial drop in training accuracy as well as a slower convergence rate. The primary reason is that fast-speed devices contribute to many more rounds of aggregation while others join more intermittently or not at all, and with stale model updates. To overcome this barrier, we propose TimelyFL, a heterogeneity-aware asynchronous FL framework with adaptive partial training. During the training, TimelyFL adjusts the local training workload based on the real-time resource capabilities of each client, aiming to allow more available clients to join in the global update without staleness. We demonstrate the performance benefits of TimelyFL by conducting extensive experiments on various datasets (e.g., CIFAR-10, Google Speech, and Reddit) and models (e.g., ResNet20, VGG11, and ALBERT). In comparison with the state-of-the-art (i.e., FedBuff), our evaluations reveal that TimelyFL improves participation rate by 21.13%, harvests 1.28x - 2.89x more efficiency on convergence rate, and provides a 6.25% increment on test accuracy.
翻译:在跨设备的联邦学习环境中,规模化同步联邦学习方法是有挑战性的,因为一个慢速设备会阻碍整个训练过程。此外,由于系统异构性和间歇性连接,每个客户端能够加入训练的可用性随时间高度变化。最近的异步联邦学习方法(例如FedBuff)已被提出以克服这些问题,允许较慢的用户基于过期模型在本地训练并在准备就绪时贡献到聚合。然而,我们通过实验证明,这种方法可能会导致训练精度显著降低以及收敛速度变慢。主要原因是快速设备对多轮聚合有更多贡献,而其他设备更加间歇性或根本不参与,并带有过期的模型更新。为了克服这一障碍,我们提出了TimelyFL,这是一种异构感知的异步联邦学习框架,具备自适应的部分训练。在训练过程中,TimelyFL根据每个客户端的实时资源能力调整本地训练工作量,旨在允许更多可用客户端加入全球更新而不会过期。我们通过在各种数据集(例如CIFAR-10、Google语音和Reddit)和模型(例如ResNet20、VGG11和ALBERT)上进行大量实验来展示TimelyFL的性能优势。与最先进的方法(即FedBuff)相比,我们的评估显示TimelyFL提高了参与率21.13%,收获了1.28倍-2.89倍的收敛速度效率,并提供了6.25%的测试准确度增量。