Federated learning (FL) is a distributed machine learning technology for next-generation AI systems that allows a number of workers, i.e., edge devices, collaboratively learn a shared global model while keeping their data locally to prevent privacy leakage. Enabling FL over wireless multi-hop networks can democratize AI and make it accessible in a cost-effective manner. However, the noisy bandwidth-limited multi-hop wireless connections can lead to delayed and nomadic model updates, which significantly slows down the FL convergence speed. To address such challenges, this paper aims to accelerate FL convergence over wireless edge by optimizing the multi-hop federated networking performance. In particular, the FL convergence optimization problem is formulated as a Markov decision process (MDP). To solve such MDP, multi-agent reinforcement learning (MA-RL) algorithms along with domain-specific action space refining schemes are developed, which online learn the delay-minimum forwarding paths to minimize the model exchange latency between the edge devices (i.e., workers) and the remote server. To validate the proposed solutions, FedEdge is developed and implemented, which is the first experimental framework in the literature for FL over multi-hop wireless edge computing networks. FedEdge allows us to fast prototype, deploy, and evaluate novel FL algorithms along with RL-based system optimization methods in real wireless devices. Moreover, a physical experimental testbed is implemented by customizing the widely adopted Linux wireless routers and ML computing nodes.Finally, our experimentation results on the testbed show that the proposed network-accelerated FL system can practically and significantly improve FL convergence speed, compared to the FL system empowered by the production-grade commercially available wireless networking protocol, BATMAN-Adv.
翻译:联邦学习(FL)是一种分布式机器学习技术,用于下一代的无线自动交换系统,它让一些工人,即边缘装置,能够合作学习一个共享的全球模型,同时将其数据保存在本地以防止隐私泄漏。让无线多跳网络的FL能够使AI民主化,并能够以具有成本效益的方式进入。然而,噪音带宽带宽多跳多跳无线连接可以导致延迟和游牧式模式更新,从而大大降低FL趋同速度的速度。为了应对这些挑战,本文件旨在通过优化多节拍的网络化性能,加速FL的无线边缘接合速度。特别是,FL的趋同速度优化优化作为Markov决定程序(MDP)来设计一个共享的全球模型。为了解决这种无线多试算,多试算(MA-RL)的算法以及特定行动空间改进系统。 在线学习延迟最小化路透传输路径,以最大限度地减少FL(即工人)和远程服务器之间的模式交流。为了验证拟议的解决方案,FDEdge-dge正在开发和实施,这是FL的虚拟网络升级测试系统与FL的快速测试系统,这让FL的FL系统与FL进行快速测试。