Safe reinforcement learning (RL) trains a policy to maximize the task reward while satisfying safety constraints. While prior works focus on the performance optimality, we find that the optimal solutions of many safe RL problems are not robust and safe against carefully designed observational perturbations. We formally analyze the unique properties of designing effective observational adversarial attackers in the safe RL setting. We show that baseline adversarial attack techniques for standard RL tasks are not always effective for safe RL and propose two new approaches - one maximizes the cost and the other maximizes the reward. One interesting and counter-intuitive finding is that the maximum reward attack is strong, as it can both induce unsafe behaviors and make the attack stealthy by maintaining the reward. We further propose a robust training framework for safe RL and evaluate it via comprehensive experiments. This paper provides a pioneer work to investigate the safety and robustness of RL under observational attacks for future safe RL studies. Code is available at: \url{https://github.com/liuzuxin/safe-rl-robustness}
翻译:安全强化学习(RL) 培训一项政策,在满足安全限制的同时最大限度地增加任务奖励; 虽然先前的工作侧重于业绩的最佳性,但我们发现,许多安全RL问题的最佳解决办法并不健全,对精心设计的观测扰动而言,并不安全; 我们正式分析了在安全RL设置中设计有效的观察对立攻击者的独特性; 我们表明,标准RL任务的基准对抗攻击技术并不总是对安全RL有效, 并提出了两种新办法―― 一种是尽量扩大费用,另一种是尽量扩大奖励。 一项有趣的反直觉发现是,最大的奖励攻击是强烈的,因为它既能引起不安全的行为,又能通过维持奖励使攻击变得隐蔽。 我们还提议一个可靠的安全RL培训框架,并通过全面试验加以评估。 本文为未来安全RL研究的观察攻击下RL的安全和稳健性提供了先锋性调查。 代码可在以下网址查阅: url{https://github.com/liuzuxin/safur-r-robastnesty}}</s>