The transfer of reinforcement learning (RL) techniques into real-world applications is challenged by safety requirements in the presence of physical limitations. Most RL methods, in particular the most popular algorithms, do not support explicit consideration of state and input constraints. In this paper, we address this problem for nonlinear systems with continuous state and input spaces by introducing a predictive safety filter, which is able to turn a constrained dynamical system into an unconstrained safe system and to which any RL algorithm can be applied `out-of-the-box'. The predictive safety filter receives the proposed control input and decides, based on the current system state, if it can be safely applied to the real system, or if it has to be modified otherwise. Safety is thereby established by a continuously updated safety policy, which is based on a model predictive control formulation using a data-driven system model and considering state and input dependent uncertainties.
翻译:将强化学习(RL)技术转移到现实世界应用受到实际限制情况下安全要求的挑战。大多数RL方法,特别是最受欢迎的算法,并不支持明确考虑状态和输入限制。在本文件中,我们通过引入一个预测安全过滤器来解决具有连续状态和输入空间的非线性系统的问题,该过滤器能够将一个受限制的动态系统转变为一个不受限制的安全系统,任何RL算法都可以“在框外”应用。预测安全过滤器接收拟议的控制输入,并根据目前的系统状态,决定是否可以安全地应用到实际系统,或者是否必须进行其他修改。因此,安全是通过一个不断更新的安全政策建立的,该安全政策的基础是使用数据驱动系统模型的模型预测控制配制,并考虑到状态和投入的不确定性。