以示范为基础的强化学习 (Federated Ensemble Model-based Reinforcement Learning)

Federated learning (FL) is a privacy-preserving machine learning paradigm that enables collaborative training among geographically distributed and heterogeneous users without gathering their data. Extending FL beyond the conventional supervised learning paradigm, federated Reinforcement Learning (RL) was proposed to handle sequential decision-making problems for various privacy-sensitive applications such as autonomous driving. However, the existing federated RL algorithms directly combine model-free RL with FL, and thus generally have high sample complexity and lack theoretical guarantees. To address the above challenges, we propose a new federated RL algorithm that incorporates model-based RL and ensemble knowledge distillation into FL. Specifically, we utilise FL and knowledge distillation to create an ensemble of dynamics models from clients, and then train the policy by solely using the ensemble model without interacting with the real environment. Furthermore, we theoretically prove that the monotonic improvement of the proposed algorithm is guaranteed. Extensive experimental results demonstrate that our algorithm obtains significantly higher sample efficiency compared to federated model-free RL algorithms in the challenging continuous control benchmark environments. The results also show the impact of non-IID client data and local update steps on the performance of federated RL, validating the insights obtained from our theoretical analysis.

翻译：联邦学习(FL)是一种保护隐私的机器学习模式,它使得地理分布分散和不同用户在不收集数据的情况下能够进行合作培训。将FL扩大到常规监管的学习模式之外,联邦强化学习(RL)建议处理各种隐私敏感应用(如自主驾驶)的顺序决策问题,然而,现有的Federate RL算法直接将无模式RL与FL结合起来,因此一般具有高样本复杂性,缺乏理论保障。为了应对上述挑战,我们提议一种新的联合RL算法,将基于模型的RL和混合知识蒸馏纳入FL。具体地说,我们利用FL和知识蒸馏法从客户那里创建动态模型的集合,然后仅通过使用联合模型来培训政策,而不与实际环境互动。此外,我们理论上证明,拟议的算法的单一式改进是有保障的。广泛的实验结果表明,我们的算法比在具有挑战性的连续控制基准环境中以无模型化的RL算法和混合知识蒸馏法获得更高的样本效率。我们利用FL和知识蒸馏法来创建客户模型的模型分析结果,还显示我们没有实际的深入分析结果。