Learning from Demonstration (LfD) is a powerful method for enabling robots to perform novel tasks as it is often more tractable for a non-roboticist end-user to demonstrate the desired skill and for the robot to efficiently learn from the associated data than for a human to engineer a reward function for the robot to learn the skill via reinforcement learning (RL). Safety issues arise in modern LfD techniques, e.g., Inverse Reinforcement Learning (IRL), just as they do for RL; yet, safe learning in LfD has received little attention. In the context of agile robots, safety is especially vital due to the possibility of robot-environment collision, robot-human collision, and damage to the robot. In this paper, we propose a safe IRL framework, CBFIRL, that leverages the Control Barrier Function (CBF) to enhance the safety of the IRL policy. The core idea of CBFIRL is to combine a loss function inspired by CBF requirements with the objective in an IRL method, both of which are jointly optimized via gradient descent. In the experiments, we show our framework performs safer compared to IRL methods without CBF, that is $\sim15\%$ and $\sim20\%$ improvement for two levels of difficulty of a 2D racecar domain and $\sim 50\%$ improvement for a 3D drone domain.
翻译:从演示中学习(LfD)是使机器人能够执行新任务的有力方法,因为对于非热带终端用户来说,这往往更有助于让机器人执行新任务,因为对于非热带终端用户来说,展示所希望的技能更加容易,对于机器人来说,安全对于从相关数据中有效学习比对于人类来说要设计一个奖励机器人通过强化学习(RL)来学习技能的奖励功能。现代LfD技术中出现了安全问题,例如,反强化学习(IRL),就像对于RL一样;然而,LfD中的安全学习很少受到注意。在敏捷机器人方面,安全对于机器人与环境碰撞、机器人与人类碰撞和机器人受损的可能性尤为重要。在本文中,我们提出了一个安全IRL框架,即CBFIR,利用控制屏功能(CFFF)来提高IR政策的安全性。 CBIRL的核心思想是将CBFI的要求所激发的损失功能与ICL中的目标结合起来,两者都是通过梯度下降而联合优化的。在实验中,我们展示我们的框架比ICFIL更安全地进行2级的改进。</s>