Federated learning (FL) is a privacy-preserving distributed machine learning paradigm that enables collaborative training among geographically distributed and heterogeneous devices without gathering their data. Extending FL beyond the supervised learning models, federated reinforcement learning (FRL) was proposed to handle sequential decision-making problems in edge computing systems. However, the existing FRL algorithms directly combine model-free RL with FL, thus often leading to high sample complexity and lacking theoretical guarantees. To address the challenges, we propose a novel FRL algorithm that effectively incorporates model-based RL and ensemble knowledge distillation into FL for the first time. Specifically, we utilise FL and knowledge distillation to create an ensemble of dynamics models for clients, and then train the policy by solely using the ensemble model without interacting with the environment. Furthermore, we theoretically prove that the monotonic improvement of the proposed algorithm is guaranteed. The extensive experimental results demonstrate that our algorithm obtains much higher sample efficiency compared to classic model-free FRL algorithms in the challenging continuous control benchmark environments under edge computing settings. The results also highlight the significant impact of heterogeneous client data and local model update steps on the performance of FRL, validating the insights obtained from our theoretical analysis.
翻译:联邦学习(FL)是一种隐私保护的分布式机器学习方法,它使地理分布和异构设备之间的协同训练成为可能,而无需收集它们的数据。在超越监督学习模型的FL框架下,联邦强化学习(FRL)被提出来解决边缘计算系统中的顺序决策问题。然而,现有的FRL算法直接将无模型RL与FL相结合,往往导致样本复杂度高,并且缺乏理论保证。为了应对这些挑战,我们首次将FL和集成知识蒸馏有效地融入模型RL框架中,提出了一种新颖的FRL算法。具体地,我们利用FL和知识蒸馏为客户端创建动力学模型的集合,然后仅使用集成模型训练策略,而不与环境进行交互。此外,我们在理论上证明了这种算法的单调改进是有保证的。广泛的实验结果表明,相比于经典的无模型FRL算法,我们的算法在具有挑战性的连续控制基准环境下,使用样本更有效率。结果还凸显了异构客户端数据和本地模型更新步骤对FRL性能的显著影响,验证了我们从理论分析中得出的见解。