Deep Reinforcement Learning (RL) has shown promise in addressing complex robotic challenges. In real-world applications, RL is often accompanied by failsafe controllers as a last resort to avoid catastrophic events. While necessary for safety, these interventions can result in undesirable behaviors, such as abrupt braking or aggressive steering. This paper proposes two safety intervention reduction methods: action replacement and projection, which change the agent's action if it leads to an unsafe state. These approaches are compared to the state-of-the-art constrained RL on the OpenAI safety gym benchmark and a human-robot collaboration task. Our study demonstrates that the combination of our method with provably safe RL leads to high-performing policies with zero safety violations and a low number of failsafe interventions. Our versatile method can be applied to a wide range of real-world robotics tasks, while effectively improving safety without sacrificing task performance.
翻译:深度强化学习(RL)在应对复杂的机器人挑战方面表现出了希望。 在现实世界应用中,RL通常由故障安全控制器作为避免灾难性事件的最后手段。这些干预虽然对安全是必要的,但可能导致不良行为,如突然刹车或攻击性方向。本文提出了两种安全干预减少方法:行动替换和预测,如果导致不安全状态,则改变代理人的行动。这些方法与OpenAI安全健身基准上最先进的限制RL以及人类机器人合作任务相比。我们的研究显示,我们的方法与可察觉的安全RL相结合,导致高绩效的政策,零安全违规和低安全干预次数。我们多种方法可以应用于广泛的现实世界机器人任务,同时有效改善安全,同时又不牺牲任务性能。</s>