Single-agent reinforcement learning algorithms in a multi-agent environment are inadequate for fostering cooperation. If intelligent agents are to interact and work together to solve complex problems, methods that counter non-cooperative behavior are needed to facilitate the training of multiple agents. This is the goal of cooperative AI. Recent work in adversarial machine learning, however, shows that models (e.g., image classifiers) can be easily deceived into making incorrect decisions. In addition, some past research in cooperative AI has relied on new notions of representations, like public beliefs, to accelerate the learning of optimally cooperative behavior. Hence, cooperative AI might introduce new weaknesses not investigated in previous machine learning research. In this paper, our contributions include: (1) arguing that three algorithms inspired by human-like social intelligence introduce new vulnerabilities, unique to cooperative AI, that adversaries can exploit, and (2) an experiment showing that simple, adversarial perturbations on the agents' beliefs can negatively impact performance. This evidence points to the possibility that formal representations of social behavior are vulnerable to adversarial attacks.
翻译:在一个多试剂环境下,单剂强化学习算法不足以促进合作;如果智能分子要互动并共同努力解决复杂的问题,就需要采用打击不合作行为的方法来帮助培训多剂分子;这是AI合作的目标。但是,在对抗性机器学习方面最近开展的工作表明,模型(例如图像分类)很容易被欺骗,作出错误的决定;此外,在合作性AI中,过去的一些研究依靠新的表达概念,如公共信仰,加速学习最佳合作行为;因此,合作性AI可能带来以前机器学习研究中未调查过的新的弱点;在本文中,我们的贡献包括:(1) 认为,由人性社会智慧所激发的三种算法带来了新的弱点,这是合作性AI所独有的,对手可以加以利用;(2) 实验表明,对代理人信仰的简单、对抗性侵扰可能会对业绩产生负面影响。这一证据表明,正式的社会行为表述有可能受到对抗性攻击。