Federated learning (FL) has become a popular tool for solving traditional Reinforcement Learning (RL) tasks. The multi-agent structure addresses the major concern of data-hungry in traditional RL, while the federated mechanism protects the data privacy of individual agents. However, the federated mechanism also exposes the system to poisoning by malicious agents that can mislead the trained policy. Despite the advantage brought by FL, the vulnerability of Federated Reinforcement Learning (FRL) has not been well-studied before. In this work, we propose the first general framework to characterize FRL poisoning as an optimization problem constrained by a limited budget and design a poisoning protocol that can be applied to policy-based FRL and extended to FRL with actor-critic as a local RL algorithm by training a pair of private and public critics. We also discuss a conventional defense strategy inherited from FL to mitigate this risk. We verify our poisoning effectiveness by conducting extensive experiments targeting mainstream RL algorithms and over various RL OpenAI Gym environments covering a wide range of difficulty levels. Our results show that our proposed defense protocol is successful in most cases but is not robust under complicated environments. Our work provides new insights into the vulnerability of FL in RL training and poses additional challenges for designing robust FRL algorithms.
翻译:联邦学习已成为解决传统强化学习任务的流行工具。多智能体结构解决了传统强化学习中需要大量数据的主要问题,而联邦机制则保护了各个智能体的数据隐私。然而,联邦机制也使系统暴露于恶意智能体的中毒攻击,这些攻击会误导训练好的策略。尽管联邦学习具有优势,但联邦强化学习的脆弱性尚未得到充分研究。在本项工作中,我们提出了第一个通用框架,将联邦强化学习本地环境攻击视为一个受限预算的优化问题,并设计了一个中毒协议,适用于基于策略的联邦强化学习,并通过训练一对私有和公共评论家将其扩展到以演员评论家为本地强化学习算法的联邦强化学习。我们还讨论了一种从联邦学习继承的传统防御策略,以缓解此风险。我们通过针对主流强化学习算法和覆盖广泛的难度级别的各种强化学习OpenAI Gym环境的广泛实验验证了我们的中毒效果。我们的结果表明,我们提出的防御协议在大多数情况下是成功的,但在复杂环境下不够稳健。我们的工作为联邦学习在强化学习训练中的脆弱性提供了新的见解,并为设计强大的联邦强化学习算法提出了额外的挑战。