Learning from Demonstration (LfD) is a powerful method for enabling robots to perform novel tasks as it is often more tractable for a non-roboticist end-user to demonstrate the desired skill and for the robot to efficiently learn from the associated data than for a human to engineer a reward function for the robot to learn the skill via reinforcement learning (RL). Safety issues arise in modern LfD techniques, e.g., Inverse Reinforcement Learning (IRL), just as they do for RL; yet, safe learning in LfD has received little attention. In the context of agile robots, safety is especially vital due to the possibility of robot-environment collision, robot-human collision, and damage to the robot. In this paper, we propose a safe IRL framework, CBFIRL, that leverages the Control Barrier Function (CBF) to enhance the safety of the IRL policy. The core idea of CBFIRL is to combine a loss function inspired by CBF requirements with the objective in an IRL method, both of which are jointly optimized via gradient descent. In the experiments, we show our framework performs safer compared to IRL methods without CBF, that is $\sim15\%$ and $\sim20\%$ improvement for two levels of difficulty of a 2D racecar domain and $\sim 50\%$ improvement for a 3D drone domain.
翻译:从演示中学习(LfD)是使机器人能够执行新任务的有力方法,因为对于非热带终端用户来说,这往往更有助于让机器人执行新任务,因为对于非热带终端用户来说,展示所希望的技能更加容易,对于机器人来说,安全对于从相关数据中有效地学习,而不是对于人类来说,使机器人通过强化学习(RL)来创造奖励功能来学习技能(RL),这是一个强大的方法。现代LfD技术中出现了安全问题,例如,反强化学习(IRL),正如对于RL一样;然而,LfD中的安全学习很少受到注意。在敏捷的机器人方面,由于机器人-环境碰撞、机器人-人类碰撞和机器人受损的可能性,安全对于机器人来说尤为重要。在本论文中,我们提出了一个安全的IRL框架,即CBFIRL,利用控制屏功能(CFFF)来加强IR政策的安全性。 CBFIRL的核心思想是将CFIFI的要求所激发的损失函数与ICF$20美元方法中的目标结合起来,两者都是通过梯度下降的美元和美元进行优化。在实验中,我们展示了比R$的域域上比较安全的2RIFIL的难度。