错误言论或错误领导:在多代理沟通强化学习中实现强力 (Mis-spoke or mis-lead: Achieving Robustness in Multi-Agent Communicative Reinforcement Learning)

Recent studies in multi-agent communicative reinforcement learning (MACRL) demonstrate that multi-agent coordination can be significantly improved when communication between agents is allowed. Meanwhile, advances in adversarial machine learning (ML) have shown that ML and reinforcement learning (RL) models are vulnerable to a variety of attacks that significantly degrade the performance of learned behaviours. However, despite the obvious and growing importance, the combination of adversarial ML and MACRL remains largely uninvestigated. In this paper, we make the first step towards conducting message attacks on MACRL methods. In our formulation, one agent in the cooperating group is taken over by an adversary and can send malicious messages to disrupt a deployed MACRL-based coordinated strategy during the deployment phase. We further our study by developing a defence method via message reconstruction. Finally, we address the resulting arms race, i.e., we consider the ability of the malicious agent to adapt to the changing and improving defensive communicative policies of the benign agents. Specifically, we model the adversarial MACRL problem as a two-player zero-sum game and then utilize Policy-Space Response Oracle to achieve communication robustness. Empirically, we demonstrate that MACRL methods are vulnerable to message attacks while our defence method the game-theoretic framework can effectively improve the robustness of MACRL.

翻译：最近在多试剂通信强化学习(MACRL)方面的研究显示,如果允许代理人之间的交流,多试剂协调可以大大改进;同时,对抗性机器学习(ML)的进展表明,对抗性机器学习(ML)模式很容易受到各种攻击,这些攻击大大削弱了所学行为的表现;然而,尽管对抗性ML和MACRL的结合作用明显且日益重要,但基本上仍未调查;在本文件中,我们迈出了对MACRL方法进行攻击的第一步。在我们的设计中,合作小组的一个代理人被对手取代,并可以发出恶意信息,以破坏部署阶段部署的以MACRL为基础的协调战略。我们进一步研究,通过信息重建开发防御性方法。最后,我们处理由此造成的军备竞赛,即我们考虑到恶意代理人适应和改进良剂防御性通信政策的能力。具体地说,我们把对抗性MACRLL问题模拟为两面零和游戏,然后利用政策性反应OCL,从而在磁性攻击中有效地展示我们脆弱的防御性防御性方法。