反对少数群体影响的多机构合作加强学习 (Attacking Cooperative Multi-Agent Reinforcement Learning by Adversarial Minority Influence)

Cooperative multi-agent reinforcement learning (c-MARL) offers a general paradigm for a group of agents to achieve a shared goal by taking individual decisions, yet is found to be vulnerable to adversarial attacks. Though harmful, adversarial attacks also play a critical role in evaluating the robustness and finding blind spots of c-MARL algorithms. However, existing attacks are not sufficiently strong and practical, which is mainly due to the ignorance of complex influence between agents and cooperative nature of victims in c-MARL. In this paper, we propose adversarial minority influence (AMI), the first practical attack against c-MARL by introducing an adversarial agent. AMI addresses the aforementioned problems by unilaterally influencing other cooperative victims to a targeted worst-case cooperation. Technically, to maximally deviate victim policy under complex agent-wise influence, our unilateral attack characterize and maximize the influence from adversary to victims. This is done by adapting a unilateral agent-wise relation metric derived from mutual information, which filters out the detrimental influence from victims to adversary. To fool victims into a jointly worst-case failure, our targeted attack influence victims to a long-term, cooperatively worst case by distracting each victim to a specific target. Such target is learned by a reinforcement learning agent in a trial-and-error process. Extensive experiments in simulation environments, including discrete control (SMAC), continuous control (MAMujoco) and real-world robot swarm control demonstrate the superiority of our AMI approach. Our codes are available in https://anonymous.4open.science/r/AMI.

翻译：合作性多剂强化学习(c-MARL)为一组代理人提供了一个通过作出个别决定实现共同目标的总范式,但发现它们很容易受到对抗性攻击的伤害。虽然有害的对抗性攻击在评估军事-军事强化算法的稳健性和寻找盲点方面也发挥着关键作用。但是,现有的攻击不够有力和实用,主要原因是C-MARL的代理人之间的复杂影响和受害者的合作性质造成的。在本文中,我们提出对抗性少数人影响(AMI),这是首次通过引入对抗性代理人来实际打击C-MAL。AMI处理上述问题,单方面影响其他合作性受害者,使其接受有针对性的最坏情况合作性合作性合作。技术上,在复杂的代理人影响下,我们单方面攻击最明显地偏离受害者政策,并最大限度地扩大对手对受害者的影响。这是通过相互信息调整单方面的代理关系,将受害者的有害影响从受害者转变为对抗性的机器人。我们的目标性攻击性攻击性攻击性攻击性攻击性受害者到一个长期的、合作性最坏的情况。A-joi-ma在现实性试验中,通过不断的实验过程,包括不断的SMAR 学习我们的实际性试验过程。