Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents' policies are based on accurate state information. However, policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state perturbation attacks. In this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to investigate the fundamental properties of MARL under state uncertainties. Our analysis shows that the commonly used solution concepts of optimal agent policy and robust Nash equilibrium do not always exist in SAMGs. To circumvent this difficulty, we consider a new solution concept called robust agent policy, where agents aim to maximize the worst-case expected state value. We prove the existence of robust agent policy for finite state and finite action SAMGs. Additionally, we propose a Robust Multi-Agent Adversarial Actor-Critic (RMA3C) algorithm to learn robust policies for MARL agents under state uncertainties. Our experiments demonstrate that our algorithm outperforms existing methods when faced with state perturbations and greatly improves the robustness of MARL policies. Our code is public on https://songyanghan.github.io/what_is_solution/.
翻译:已经制定了多种机构强化学习方法(MARL),假设代理商的政策以准确的国家信息为基础。然而,通过深强化学习(DRL)所学的政策很容易受到对抗性国家扰动攻击。在这项工作中,我们提议了一个国家-Aversarial Markov游戏(SAMG),并首次试图在州不确定的情况下调查MARL的基本特性。我们的分析表明,在SAMGs中,通常使用的最佳代理政策和稳健的纳什平衡的解决方案概念并不总是存在。为了绕过这一困难,我们考虑了一个称为稳健的代理商政策的新解决方案概念,即:代理人政策旨在最大限度地增加最坏情况的预期国家价值。我们证明,对于有限的州和有限行动,存在着强有力的代理商政策。此外,我们提议了一个强健的多动性反动行为-Crital(RMA3C)算法,以便在州不确定的情况下为MARL代理商学习稳健的政策。我们的算法在面临州际干扰和大大改进MARL政策的稳健性时,我们的方法比现有方法要优。我们的代码是公共的 http://Angis_angis_sangis_stis_s)。