Reinforcement learning has achieved tremendous success in many complex decision making tasks. When it comes to deploying RL in the real world, safety concerns are usually raised, leading to a growing demand for safe reinforcement learning algorithms, such as in autonomous driving and robotics scenarios. While safety control has a long history, the study of safe RL algorithms is still in the early stages. To establish a good foundation for future research in this thread, in this paper, we provide a review for safe RL from the perspectives of methods, theory and applications. Firstly, we review the progress of safe RL from five dimensions and come up with five problems that are crucial for safe RL being deployed in real-world applications, coined as "2H3W". Secondly, we analyze the theory and algorithm progress from the perspectives of answering the "2H3W" problems. Then, the sample complexity of safe RL methods is reviewed and discussed, followed by an introduction of the applications and benchmarks of safe RL algorithms. Finally, we open the discussion of the challenging problems in safe RL, hoping to inspire more future research on this thread. To advance the study of safe RL algorithms, we release a benchmark suite, an open-sourced repository containing the implementations of major safe RL algorithms, along with tutorials at the link: https://github.com/chauncygu/Safe-Reinforcement-Learning-Baselines.git.
翻译:在许多复杂的决策任务中,强化学习取得了巨大的成功。在现实世界中部署RL时,通常会提出安全关切,导致对安全强化学习算法的需求增加,例如在自主驾驶和机器人情景中。安全控制历史悠久,安全RL算法的研究仍处于早期阶段。为了为今后对此线索的研究打下良好的基础,本文件从方法、理论和应用的角度出发,我们从安全RL的角度对安全RL进行了审查。首先,我们从五个层面审查安全RL的进展情况,并提出安全RL在现实世界应用程序中部署安全RL至关重要的五个问题,即“2H3W”。第二,我们从回答“2H3W”问题的角度分析理论和算法进展。然后,审查和讨论安全RL方法的样本复杂性,然后介绍安全RL算法的应用和基准。最后,我们从安全RL开始讨论具有挑战性的问题,希望激发未来对安全RLL的更多研究。为了推进安全RLLLL应用程序/com应用程序的使用,我们推出一个包含主要 RLLL RL asx 数据库的安全性数据库的安全性数据库,我们发布了一个基准。