Deep reinforcement learning (DRL) is vulnerable to adversarial perturbations. Adversaries can mislead the policies of DRL agents by perturbing the state of the environment observed by the agents. Existing attacks are feasible in principle, but face challenges in practice, either by being too slow to fool DRL policies in real time or by modifying past observations stored in the agent's memory. We show that Universal Adversarial Perturbations (UAP), independent of the individual inputs to which they are applied, can fool DRL policies effectively and in real time. We introduce three attack variants leveraging UAP. Via an extensive evaluation using three Atari 2600 games, we show that our attacks are effective, as they fully degrade the performance of three different DRL agents (up to 100%, even when the $l_\infty$ bound on the perturbation is as small as 0.01). It is faster than the frame rate (60 Hz) of image capture and considerably faster than prior attacks ($\approx 1.8$ms). Our attack technique is also efficient, incurring an online computational cost of $\approx 0.027$ms. Using two tasks involving robotic movement, we confirm that our results generalize to complex DRL tasks. Furthermore, we demonstrate that the effectiveness of known defenses diminishes against universal perturbations. We introduce an effective technique that detects all known adversarial perturbations against DRL policies, including all universal perturbations presented in this paper.
翻译:深度强化学习( DRL) 容易受到对抗性干扰。 反向研究可以通过干扰代理人观察到的环境状况来误导DRL代理商的政策。 现有的袭击原则上是可行的, 但在实践中却面临挑战, 要么是过于缓慢, 无法实时愚弄 DRL 政策, 要么是修改过去存储在代理人记忆中的观测结果。 我们显示, 独立于应用的单个投入的通用反向干扰( UAP) 能够有效和实时地愚弄 DRL 政策。 我们引入了三个利用 UAP 的进攻变体。 我们使用3个 Atari 2600 游戏进行广泛的评估, 显示我们的攻击是有效的, 因为它们完全降低了DRL 3 不同代理商的性能( 高达100 %, 即便受干扰的美元约束的美元比 0.01 还要小 ) 。 我们的图像捕获框架率( 60 Hz ) 和远比先前的攻击速度( $Approx 1. 8 mms ) 。 我们的攻击技术也非常高效, 使得在线计算成本, 包括 $\ palalalalalalalalalalalationalationalation resmission lexalal ex ex ex exx exx exxxx 。