Federated learning (FL) has become a popular tool for solving traditional Reinforcement Learning (RL) tasks. The multi-agent structure addresses the major concern of data-hungry in traditional RL, while the federated mechanism protects the data privacy of individual agents. However, the federated mechanism also exposes the system to poisoning by malicious agents that can mislead the trained policy. Despite the advantage brought by FL, the vulnerability of Federated Reinforcement Learning (FRL) has not been well-studied before. In this work, we propose the first general framework to characterize FRL poisoning as an optimization problem constrained by a limited budget and design a poisoning protocol that can be applied to policy-based FRL and extended to FRL with actor-critic as a local RL algorithm by training a pair of private and public critics. We also discuss a conventional defense strategy inherited from FL to mitigate this risk. We verify our poisoning effectiveness by conducting extensive experiments targeting mainstream RL algorithms and over various RL OpenAI Gym environments covering a wide range of difficulty levels. Our results show that our proposed defense protocol is successful in most cases but is not robust under complicated environments. Our work provides new insights into the vulnerability of FL in RL training and poses additional challenges for designing robust FRL algorithms.
翻译:联邦学习联合会(FL)已成为解决传统强化学习(RL)任务的一个流行工具。多试剂结构解决了传统RL中数据饥饿的主要关切,而联邦机制保护个人代理人的数据隐私。不过,联邦机制还暴露了系统受到恶意代理人的毒害,这些恶意代理人可以误导经过培训的政策。尽管FL带来了好处,但联邦强化学习联合会(FRL)的脆弱性以前没有得到很好地研究。在这项工作中,我们提出了第一个总框架,将FRL中毒定性为受有限预算制约的优化问题,并设计了一种中毒协议,适用于基于政策的FRL, 并扩展至FRL, 将演员批评作为当地RL算法的算法。我们还讨论了从FL继承的常规防御战略来减轻这一风险。我们通过对主流RL算法和各种RL OpenAI Gym环境进行广泛的实验来核查我们的中毒效果。我们提出的防御协议在多数情况下是成功的,但对于基于基于政策FL的脆弱程度提出了新的挑战。</s>