Reinforcement Learning (RL) is essentially a trial-and-error learning procedure which may cause unsafe behavior during the exploration-and-exploitation process. This hinders the application of RL to real-world control problems, especially to those for safety-critical systems. In this paper, we introduce a framework for safe RL that is based on integration of a RL algorithm with an add-on safety supervision module, called the Robust Action Governor (RAG), which exploits set-theoretic techniques and online optimization to manage safety-related requirements during learning. We illustrate this proposed safe RL framework through an application to automotive adaptive cruise control.
翻译:强化学习(RL)基本上是在勘探和开发过程中可能造成不安全行为的试验和危险学习程序,这妨碍了将RL应用于现实世界的控制问题,特别是安全临界系统的问题。在本文件中,我们引入了一个安全RL框架,其基础是将RL算法与一个附加安全监督模块(称为 " 强力行动州长 " )相结合,该模块利用固定理论技术和在线优化,在学习期间管理与安全有关的要求。我们通过应用机动机动机动机动机动机动车辆适应性游轮控制,来说明这一拟议安全RL框架。