Ensuring safety is a crucial challenge when deploying reinforcement learning (RL) to real-world systems. We develop confidence-based safety filters, a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard RL techniques, based on probabilistic dynamics models. Our approach is based on a reformulation of state constraints in terms of cost functions, reducing safety verification to a standard RL task. By exploiting the concept of hallucinating inputs, we extend this formulation to determine a "backup" policy that is safe for the unknown system with high probability. Finally, the nominal policy is minimally adjusted at every time step during a roll-out towards the backup policy, such that safe recovery can be guaranteed afterwards. We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
翻译:在向现实世界系统部署强化学习(RL)时,确保安全是一项关键的挑战。我们开发了基于信任的安全过滤器,这是一种控制理论方法,用以根据概率动态模型,对通过标准RL技术学习的名义政策进行控制理论认证,其依据是概率动态模型。我们的方法基于重新制定成本功能方面的国家限制,将安全核查减少到标准RL任务。我们利用幻觉投入的概念,扩展了这一提法,以确定“备份”政策,该“备份”政策对未知系统非常有可能安全。最后,在推出备份政策的过程中,名义政策在每一个阶段都得到最低限度的调整,这样,安全回收在以后就可以得到保证。我们提供了正式的安全保障,并用经验证明了我们的方法的有效性。