Recent work has discovered that deep reinforcement learning (DRL) policies are vulnerable to adversarial examples. These attacks mislead the policy of DRL agents by perturbing the state of the environment observed by agents. They are feasible in principle but too slow to fool DRL policies in real time. We propose a new attack to fool DRL policies that is both effective and efficient enough to be mounted in real time. We utilize the Universal Adversarial Perturbation (UAP) method to compute effective perturbations independent of the individual inputs to which they are applied. Via an extensive evaluation using Atari 2600 games, we show that our technique is effective, as it fully degrades the performance of both deterministic and stochastic policies (up to 100%, even when the $l_\infty$ bound on the perturbation is as small as 0.005). We also show that our attack is efficient, incurring an online computational cost of 0.027ms on average. It is faster compared to the response time (0.6ms on average) of agents with different DRL policies, and considerably faster than prior attacks (2.7ms on average). Furthermore, we demonstrate that known defenses are ineffective against universal perturbations. We propose an effective detection technique which can form the basis for robust defenses against attacks based on universal perturbations.
翻译:最近的工作发现,深入强化学习(DRL)政策很容易受到对抗性的例子的影响。这些攻击通过干扰代理人所观察到的环境状况,误导了DRL代理人的政策。它们原则上是可行的,但过于缓慢,无法实时愚弄DRL政策。我们提议进行新的攻击,以欺骗DRL政策,这种政策既有效又有效率,足以实时执行。我们使用通用反扰动(UAP)方法,计算有效扰动,而不受个人投入的影响。通过使用Atari 2600游戏进行的广泛评价,我们表明我们的技术是有效的,因为它完全降低了确定性和随机政策的绩效(甚至高达100 %, 即使受扰动约束的美元数额小到0.005 ) 。我们还表明,我们的攻击是有效的,平均计算费用为0.027米。 与答复时间(平均0.6米)相比,我们用不同的DRL政策进行的广泛评价,我们显示我们的技术是有效的,因为它完全降低了确定性政策的表现(甚至比我们所知道的防御系统效率要快得多)。