Recent studies have shown that deep reinforcement learning (DRL) policies are vulnerable to adversarial attacks, which raise concerns about applications of DRL to safety-critical systems. In this work, we adopt a principled way and study the robustness of DRL policies to adversarial attacks from the perspective of robust optimization. Within the framework of robust optimization, optimal adversarial attacks are given by minimizing the expected return of the policy, and correspondingly a good defense mechanism should be realized by improving the worst-case performance of the policy. Considering that attackers generally have no access to the training environment, we propose a greedy attack algorithm, which tries to minimize the expected return of the policy without interacting with the environment, and a defense algorithm, which performs adversarial training in a max-min form. Experiments on Atari game environments show that our attack algorithm is more effective and leads to worse return of the policy than existing attack algorithms, and our defense algorithm yields policies more robust than existing defense methods to a range of adversarial attacks (including our proposed attack algorithm).
翻译:最近的研究显示,深入强化学习(DRL)政策很容易受到对抗性攻击,这引起了对DRL对安全临界系统应用的担忧。在这项工作中,我们从强力优化的角度采取有原则的方法并研究DRL政策对对抗性攻击的稳健性。 在强力优化的框架下,最佳的对抗性攻击通过尽量降低政策的预期回报来提供,相应地,一个良好的防御机制应该通过改善最坏的政策表现来实现。考虑到攻击者一般无法进入培训环境,我们提议一种贪婪的攻击算法,它试图在不与环境互动的情况下最大限度地减少政策的预期回报,以及一种防御算法,它以最大程度的形式进行对抗性训练。 对Atari游戏环境的实验表明,我们的攻击算法比现有的攻击算法更有效,导致比现有的攻击算法更糟糕的政策回报,我们的防御性算法使政策比现有的防御性方法更强大,以一系列对抗性攻击(包括我们提出的攻击算法 ) 。