Scaling model-based inverse reinforcement learning (IRL) to real robotic manipulation tasks with unknown dynamics remains an open problem. The key challenges lie in learning good dynamics models, developing algorithms that scale to high-dimensional state-spaces and being able to learn from both visual and proprioceptive demonstrations. In this work, we present a gradient-based inverse reinforcement learning framework that utilizes a pre-trained visual dynamics model to learn cost functions when given only visual human demonstrations. The learned cost functions are then used to reproduce the demonstrated behavior via visual model predictive control. We evaluate our framework on hardware on two basic object manipulation tasks.
翻译:基于缩放模型的反向强化学习(IRL)到具有未知动态的真正机器人操纵任务,仍然是一个尚未解决的问题。关键的挑战在于学习良好的动态模型,开发到高维状态空间的算法,以及能够从视觉和自我感知演示中学习。在这项工作中,我们提出了一个基于梯度的反向强化学习框架,它利用预先训练的视觉动态模型,在只提供视觉人类演示时学习成本功能。随后,学习到的成本功能被用来通过视觉模型预测控制复制所显示的行为。我们评估了我们关于两种基本物体操纵任务的硬件框架。