Recent research has confirmed the feasibility of backdoor attacks in deep reinforcement learning (RL) systems. However, the existing attacks require the ability to arbitrarily modify an agent's observation, constraining the application scope to simple RL systems such as Atari games. In this paper, we migrate backdoor attacks to more complex RL systems involving multiple agents and explore the possibility of triggering the backdoor without directly manipulating the agent's observation. As a proof of concept, we demonstrate that an adversary agent can trigger the backdoor of the victim agent with its own action in two-player competitive RL systems. We prototype and evaluate BACKDOORL in four competitive environments. The results show that when the backdoor is activated, the winning rate of the victim drops by 17% to 37% compared to when not activated.
翻译:最近的研究证实了深强化学习(RL)系统中后门攻击的可行性。然而,现有的攻击要求能够任意修改代理人的观察,将应用范围限制在Atari游戏等简单的RL系统上。在本文中,我们将后门攻击转移到涉及多个代理人的更复杂的RL系统上,并探索在不直接操纵代理人的观察的情况下触发后门攻击的可能性。作为概念的证明,我们证明一个敌对的代理人可以在两个玩家的竞争RL系统中自己行动,触发受害者代理人的后门。我们在四个竞争环境中对FREDOORL进行原型和评估。结果显示,当后门启动时,受害者的获胜率比未激活时下降了17%至37%。