This work reports on developing a deep inverse reinforcement learning method for legged robots terrain traversability modeling that incorporates both exteroceptive and proprioceptive sensory data. Existing works use robot-agnostic exteroceptive environmental features or handcrafted kinematic features; instead, we propose to also learn robot-specific inertial features from proprioceptive sensory data for reward approximation in a single deep neural network. Incorporating the inertial features can improve the model fidelity and provide a reward that depends on the robot's state during deployment. We train the reward network using the Maximum Entropy Deep Inverse Reinforcement Learning (MEDIRL) algorithm and propose simultaneously minimizing a trajectory ranking loss to deal with the suboptimality of legged robot demonstrations. The demonstrated trajectories are ranked by locomotion energy consumption, in order to learn an energy-aware reward function and a more energy-efficient policy than demonstration. We evaluate our method using a dataset collected by an MIT Mini-Cheetah robot and a Mini-Cheetah simulator. The code is publicly available at https://github.com/ganlumomo/minicheetah-traversability-irl.
翻译:这份工作报告涉及为腿式机器人开发一种深反强化学习方法,用于对腿式机器人进行地形穿行模型,其中既包括外观和自动感官数据;现有工作使用机器人-神学外观环境特征或手制运动特征;相反,我们提议从一个单深神经网络中自行感知感官数据中学习机器人特有的惯性特征,以奖励近似。纳入惯性特征可以提高模型的忠诚度,并提供一个取决于机器人部署期间状态的奖励。我们用最大反向强化学习(MEDIRL)算法培训奖励网络,并同时提议尽量减少轨迹级损失,以处理脚式机器人演示的亚优性。所显示的轨迹按悬浮能源消耗排列,以便学习一种能量觉奖励功能和更节能的政策,而不是演示。我们使用MIT Mini-Cheetah机器人和迷你-Cheetah模拟器收集的数据集来评估我们的方法。