Communication is important in many multi-agent reinforcement learning (MARL) problems for agents to share information and make good decisions. However, when deploying trained communicative agents in a real-world application where noise and potential attackers exist, the safety of communication-based policies becomes a severe issue that is underexplored. Specifically, if communication messages are manipulated by malicious attackers, agents relying on untrustworthy communication may take unsafe actions that lead to catastrophic consequences. Therefore, it is crucial to ensure that agents will not be misled by corrupted communication, while still benefiting from benign communication. In this work, we consider an environment with $N$ agents, where the attacker may arbitrarily change the communication from any $C<\frac{N-1}{2}$ agents to a victim agent. For this strong threat model, we propose a certifiable defense by constructing a message-ensemble policy that aggregates multiple randomly ablated message sets. Theoretical analysis shows that this message-ensemble policy can utilize benign communication while being certifiably robust to adversarial communication, regardless of the attacking algorithm. Experiments in multiple environments verify that our defense significantly improves the robustness of trained policies against various types of attacks.
翻译:在许多多剂强化学习(MARL)问题中,通信对于代理人分享信息和做出正确决定十分重要。然而,当在现实应用中部署经过训练的通信代理人时,如果噪音和潜在攻击者存在,则通信政策的安全就成为一个尚未探讨的严重问题。具体地说,如果通信信息被恶意攻击者操纵,依赖不可信通信的代理人可能会采取导致灾难性后果的不安全行动。因此,至关重要的是确保代理人不会被腐败通信所误导,同时仍然从良性通信中获益。在这项工作中,我们考虑到一个用$N的代理器进行误导的环境,攻击者可能会任意地将通信从任何美元或弗拉克{N-1 ⁇ 2}的代理人改变给受害者代理人。对于这种强大的威胁模式,我们提出可以证明的防御方法是,通过建立一个信息集成式的政策,将多处随机地汇总信息组合起来,从而导致灾难性的后果。 理论分析表明,这种信息组合政策可以利用良性通信,同时对对抗性通信具有可证实的可靠性,而不管攻击性算法如何。在多个环境中进行实验,核查我们的防御能大大改进各种经过训练的攻击政策。