Federated learning (FL) is a privacy-preserving machine learning paradigm that enables collaborative training among geographically distributed and heterogeneous devices without gathering their data. Extending FL beyond the supervised learning models, federated reinforcement learning (FRL) was proposed to handle sequential decision-making problems in edge computing systems. However, the existing FRL algorithms directly combine model-free RL with FL, thus often leading to high sample complexity and lacking theoretical guarantees. To address the challenges, we propose a novel FRL algorithm that effectively incorporates model-based RL and ensemble knowledge distillation into FL for the first time. Specifically, we utilise FL and knowledge distillation to create an ensemble of dynamics models for clients, and then train the policy by solely using the ensemble model without interacting with the environment. Furthermore, we theoretically prove that the monotonic improvement of the proposed algorithm is guaranteed. The extensive experimental results demonstrate that our algorithm obtains much higher sample efficiency compared to classic model-free FRL algorithms in the challenging continuous control benchmark environments under edge computing settings. The results also highlight the significant impact of heterogeneous client data and local model update steps on the performance of FRL, validating the insights obtained from our theoretical analysis.
翻译:联邦学习(FL)是一种保护隐私的机器学习模式,它使得在不收集数据的情况下在地理分布和不同设备之间开展协作培训,从而无需收集数据。将FL扩大到监督的学习模式之外,建议联合强化学习(FRL)处理边缘计算系统中的顺序决策问题。然而,现有的FRL算法直接将无模型的RL与FL直接结合,从而往往导致高样本复杂性和缺乏理论保障。为了应对挑战,我们提出了一种新的FRL算法,其中将基于模型的RL和混合知识蒸馏首次有效地纳入FL。具体地说,我们利用FL和知识蒸馏法为客户创建一套动态模型,然后仅仅通过使用共同模型来培训政策,而不与环境互动。此外,我们理论上证明,拟议的算法的单一改进得到了保障。广泛的实验结果表明,我们的算法比传统的没有模型的FRL算法在边缘计算环境中具有挑战性的连续控制基准环境中获得更高的样本效率。结果还突出表明了来自混化客户数据的重要影响,以及我们从FRL的理论性分析中获得了对FRRRL业绩的正确分析。