Reinforcement Learning (RL) has been shown to be effective in many scenarios. However, it typically requires the exploration of a sufficiently large number of state-action pairs, some of which may be unsafe. Consequently, its application to safety-critical systems remains a challenge. An increasingly common approach to address safety involves the addition of a safety layer that projects the RL actions onto a safe set of actions. In turn, a difficulty for such frameworks is how to effectively couple RL with the safety layer to improve the learning performance. In this paper, we frame safety as a differentiable robust-control-barrier-function layer in a model-based RL framework. Moreover, we also propose an approach to modularly learn the underlying reward-driven task, independent of safety constraints. We demonstrate that this approach both ensures safety and effectively guides exploration during training in a range of experiments, including zero-shot transfer when the reward is learned in a modular way.
翻译:在许多情形中,强化学习(RL)已证明是有效的,但通常需要探索足够多的州-州-行动对,其中一些可能是不安全的,因此,将其应用于安全关键系统仍是一项挑战。一个日益常见的解决安全问题的方法是增加一个安全层,将RL行动投射到安全的行动中。反过来,这种框架的一个困难是如何有效地将RL与安全层结合起来,以提高学习绩效。在本文件中,我们把安全定义为一个基于模型的RL框架中一个可区别的强力控制-控制-功能层。此外,我们还提出了一个模块化学习基本以奖励为驱动的任务的方法,独立于安全限制。我们证明,这种方法既能确保安全,又能有效指导在一系列实验培训期间的探索,包括在以模块方式学习奖励时的零转让。