Federated learning (FL) is an emerging, privacy-preserving machine learning paradigm, drawing tremendous attention in both academia and industry. A unique characteristic of FL is heterogeneity, which resides in the various hardware specifications and dynamic states across the participating devices. Theoretically, heterogeneity can exert a huge influence on the FL training process, e.g., causing a device unavailable for training or unable to upload its model updates. Unfortunately, these impacts have never been systematically studied and quantified in existing FL literature. In this paper, we carry out the first empirical study to characterize the impacts of heterogeneity in FL. We collect large-scale data from 136k smartphones that can faithfully reflect heterogeneity in real-world settings. We also build a heterogeneity-aware FL platform that complies with the standard FL protocol but with heterogeneity in consideration. Based on the data and the platform, we conduct extensive experiments to compare the performance of state-of-the-art FL algorithms under heterogeneity-aware and heterogeneity-unaware settings. Results show that heterogeneity causes non-trivial performance degradation in FL, including up to 9.2% accuracy drop, 2.32x lengthened training time, and undermined fairness. Furthermore, we analyze potential impact factors and find that device failure and participant bias are two potential factors for performance degradation. Our study provides insightful implications for FL practitioners. On the one hand, our findings suggest that FL algorithm designers consider necessary heterogeneity during the evaluation. On the other hand, our findings urge system providers to design specific mechanisms to mitigate the impacts of heterogeneity.
翻译:联邦学习(FL)是一个新兴的、保护隐私的机器学习模式,引起学术界和工业界的极大关注。FL的一个独特特征是异质性,它存在于各种硬件规格和各种参与装置的动态状态中。理论上,异质性能可对FL培训进程产生巨大影响,例如,造成一个无法用于培训的设备,或无法上传其模型更新。不幸的是,这些影响在现有FL文献中从未进行过系统研究和量化。在本文中,我们进行了首次实验,以描述FL的异质性影响。我们从136k智能手机收集了大规模数据,这些数据可以忠实反映现实世界环境中的异质性。从理论上讲,异质性能性能能可以对FL产生巨大影响,例如,导致一个标准FL协议无法用于培训,但又无法上传其模型更新。在数据和平台上,我们进行了广泛的实验,以比较FL的高级算法在硬性性能和超异性性性性能下的业绩。我们在必要识别和超异性性能的系统下,从一种时间性能分析中,显示其潜在的性能性能设计结果。结果显示,包括不易性能性能性能性能的性能设计。