Reinforcement learning (RL) has achieved tremendous success in many complex decision making tasks. When it comes to deploying RL in the real world, safety concerns are usually raised, leading to a growing demand for safe RL algorithms, such as in autonomous driving and robotics scenarios. While safety control has a long history, the study of safe RL algorithms is still in the early stages. To establish a good foundation for future research in this thread, in this paper, we provide a review for safe RL from the perspectives of methods, theory and applications. Firstly, we review the progress of safe RL from five dimensions and come up with five problems that are crucial for safe RL being deployed in real-world applications, coined as "2H3W". Secondly, we analyze the theory and algorithm progress from the perspectives of answering the "2H3W" problems. Then, the sample complexity of safe RL methods is reviewed and discussed, followed by an introduction of the applications and benchmarks of safe RL algorithms. Finally, we open the discussion of the challenging problems in safe RL, hoping to inspire more future research on this thread. To advance the study of safe RL algorithms, we release a benchmark suite, an open-sourced repository containing the implementations of major safe RL algorithms, along with tutorials at the link: https://github.com/chauncygu/Safe-Reinforcement-Learning-Baselines.git.
翻译:在许多复杂的决策任务中,强化学习(RL)取得了巨大成功。在现实世界中部署RL时,通常会提出安全关切,导致对安全RL算法的需求增加,例如在自主驾驶和机器人情景中。虽然安全控制历史悠久,但安全RL算法的研究仍处于早期阶段。为了为今后对这一线索的研究打下良好的基础,本文件从方法、理论和应用的角度出发,我们从安全RL的角度对安全RL进行了审查。首先,我们从五个层面审查安全RL的进展,并提出了五个问题,这些问题对于安全RL在现实世界应用中部署安全RL算法至关重要,这些问题被称为“2H3W”。第二,我们从回答“2H3W”问题的角度分析理论和算法的进展。然后,对安全RL方法的抽样复杂性进行审查和讨论,然后介绍安全RLL算法的应用和基准。最后,我们开始讨论安全RL的棘手问题,希望激发更多的未来研究线索上的安全RLL。为了推进安全性RLAL算法的系统基础,在安全性RL算法执行中的主要链接,在RL级数据库中进行一个安全性数据库的链接。