Federated learning (FL) is capable of performing large distributed machine learning tasks across multiple edge users by periodically aggregating trained local parameters. To address key challenges of enabling FL over a wireless fog-cloud system (e.g., non-i.i.d. data, users' heterogeneity), we first propose an efficient FL algorithm based on Federated Averaging (called FedFog) to perform the local aggregation of gradient parameters at fog servers and global training update at the cloud. Next, we employ FedFog in wireless fog-cloud systems by investigating a novel network-aware FL optimization problem that strikes the balance between the global loss and completion time. An iterative algorithm is then developed to obtain a precise measurement of the system performance, which helps design an efficient stopping criteria to output an appropriate number of global rounds. To mitigate the straggler effect, we propose a flexible user aggregation strategy that trains fast users first to obtain a certain level of accuracy before allowing slow users to join the global training updates. Extensive numerical results using several real-world FL tasks are provided to verify the theoretical convergence of FedFog. We also show that the proposed co-design of FL and communication is essential to substantially improve resource utilization while achieving comparable accuracy of the learning model.
翻译:联邦学习(FL)能够通过定期汇集经过训练的地方参数,在多个边缘用户中执行大规模分散的机器学习任务。为了应对在无线雾球系统(例如非i.id数据、用户的异异异质性)下使FL能够使用无线雾雾球系统的主要挑战,我们首先根据Freed Average(称为FFFFooog)提出高效的FL算法算法,以在雾服务器和云中全球培训更新中执行当地汇总梯度参数;接着,我们通过定期汇集经过培训的当地参数,在无线雾云系统中使用FFFFFFOGFFFOG, 调查一个新颖的网络能为FL优化的FL优化问题,以平衡全球损失和完成时间之间的平衡。然后开发一个迭代算法算法,以精确测量系统性能,帮助设计一个高效停止标准,以输出适当数目的全球回合。为减缓效果,我们提出一个灵活的用户集中战略,先培训快速用户,然后获得一定的准确度,然后允许慢用户加入全球培训更新更新的全球更新。我们的一些实际的FFFFFFOG的模型的理论的利用,我们还显示基本的利用,同时学习FFFFFFFFFFFF的利用,同时学习的利用的拟议资源。我们还正在正在的利用拟议的的利用,同时正在的利用拟议的FFFFFFFFFFFFFFF的利用,还的利用,还学习的利用。