Evaluating the worst-case performance of a reinforcement learning (RL) agent under the strongest/optimal adversarial perturbations on state observations (within some constraints) is crucial for understanding the robustness of RL agents. However, finding the optimal adversary is challenging, in terms of both whether we can find the optimal attack and how efficiently we can find it. Existing works on adversarial RL either use heuristics-based methods that may not find the strongest adversary, or directly train an RL-based adversary by treating the agent as a part of the environment, which can find the optimal adversary but may become intractable in a large state space. In this paper, we propose a novel attacking algorithm which has an RL-based "director" searching for the optimal policy perturbation, and an "actor" crafting state perturbations following the directions from the director (i.e. the actor executes targeted attacks). Our proposed algorithm, PA-AD, is theoretically optimal against an RL agent and significantly improves the efficiency compared with prior RL-based works in environments with large or pixel state spaces. Empirical results show that our proposed PA-AD universally outperforms state-of-the-art attacking methods in a wide range of environments. Our method can be easily applied to any RL algorithms to evaluate and improve their robustness.
翻译:在最强/最优化的对抗性干扰状态下评估强化学习(RL)剂最坏的性能,对于了解RL剂的稳健性至关重要。然而,找到最佳的对手具有挑战性,从我们能否找到最佳攻击力和我们能找到最佳攻击力的角度,从我们能否找到最佳攻击力和如何找到最佳攻击力的角度看,都具有挑战性。关于敌对的RL的现有工作要么使用可能找不到最强对手的超自然法系方法,要么直接培训基于RL的对手,将该剂作为环境的一部分,这可以找到最佳的对手,但在较大的州空间中可能变得难以解决。在本文中,我们建议采用新的攻击算法,以RL为主的“方向”,寻找最佳攻击力,寻找最佳攻击力的最佳攻击力,以及根据导演的指示(即行为者执行有针对性的攻击力攻击),设计出一种“动力”干扰状态。我们提议的算法,即PA-A-A-AD是理论上最优于RL剂,与以前在大型或皮克斯国家空间环境中的RL工作相比,大大地提高了效率。Emprisalalal-assal-view amas astreving astrual bestration astrual bestrual bestru