Autonomous intelligent agents deployed to the real-world need to be robust against adversarial attacks on sensory inputs. Existing work in reinforcement learning focuses on minimum-norm perturbation attacks, which were originally introduced to mimic a notion of perceptual invariance in computer vision. In this paper, we note that such minimum-norm perturbation attacks can be trivially detected by victim agents, as these result in observation sequences that are not consistent with the victim agent's actions. Furthermore, many real-world agents, such as physical robots, commonly operate under human supervisors, which are not susceptible to such perturbation attacks. As a result, we propose to instead focus on illusionary attacks, a novel form of attack that is consistent with the world model of the victim agent. We provide a formal definition of this novel attack framework, explore its characteristics under a variety of conditions, and conclude that agents must seek realism feedback to be robust to illusionary attacks.
翻译:部署到现实世界的自主智能剂需要强力对付对感官投入的对抗性攻击。现有的强化学习工作侧重于最低温调扰动攻击,这些攻击最初是用来模仿计算机视觉中一种概念性惯性概念的。在本文中,我们注意到,这种最低温调扰动攻击可由受害者剂小幅地检测,因为其结果是观察序列与受害者剂的行动不一致。此外,许多真实世界剂,例如身体机器人,通常在人类监督者领导下操作,不受这种扰动攻击的影响。因此,我们提议把重点放在幻觉攻击上,这是一种与受害者代理人的世界模式相一致的新式攻击形式。我们对这种新式攻击框架提供了一个正式的定义,在各种条件下探索其特征,并得出结论认为,代理人必须寻求现实主义的反馈,以对幻动性攻击具有活力。