Cooperative multi-agent reinforcement learning (c-MARL) is widely applied in safety-critical scenarios, thus the analysis of robustness for c-MARL models is profoundly important. However, robustness certification for c-MARLs has not yet been explored in the community. In this paper, we propose a novel certification method, which is the first work to leverage a scalable approach for c-MARLs to determine actions with guaranteed certified bounds. c-MARL certification poses two key challenges compared with single-agent systems: (i) the accumulated uncertainty as the number of agents increases; (ii) the potential lack of impact when changing the action of a single agent into a global team reward. These challenges prevent us from directly using existing algorithms. Hence, we employ the false discovery rate (FDR) controlling procedure considering the importance of each agent to certify per-state robustness and propose a tree-search-based algorithm to find a lower bound of the global reward under the minimal certified perturbation. As our method is general, it can also be applied in single-agent environments. We empirically show that our certification bounds are much tighter than state-of-the-art RL certification solutions. We also run experiments on two popular c-MARL algorithms: QMIX and VDN, in two different environments, with two and four agents. The experimental results show that our method produces meaningful guaranteed robustness for all models and environments. Our tool CertifyCMARL is available at https://github.com/TrustAI/CertifyCMA
翻译:合作性多剂加固学习(c-MARL)在安全危急情况下得到广泛应用,因此,分析C-MARL模型的稳健性是十分重要的。然而,社区尚未探讨C-MARL模型的稳健性认证。在本文件中,我们提出一种新的认证方法,这是为C-MARL确定具有有保证认证界限的行动而采用可扩展方法的首个工作。c-MARL认证与单一试剂系统相比,提出了两大挑战:(一) 随着代理商数量的增加,不确定性的积累也增加了;(二) 当将单一代理商的行动转变为全球团队的奖励时,潜在缺乏影响。这些挑战使我们无法直接使用现有的算法。因此,我们采用了虚假的检测率(FDR)控制程序,考虑到每个代理商对认证每州稳健性的重要性,并提出基于树基的算法,以在最低认证的透度下找到较低的全球奖赏约束。我们的方法是通用的,也可以在单一代理环境中应用。我们的经验显示,我们的认证约束性约束性比两种标准更紧密,我们的国家-MA-Ral-ral的测试方法。