Research on Inverse Reinforcement Learning (IRL) from third-person videos has shown encouraging results on removing the need for manual reward design for robotic tasks. However, most prior works are still limited by training from a relatively restricted domain of videos. In this paper, we argue that the true potential of third-person IRL lies in increasing the diversity of videos for better scaling. To learn a reward function from diverse videos, we propose to perform graph abstraction on the videos followed by temporal matching in the graph space to measure the task progress. Our insight is that a task can be described by entity interactions that form a graph, and this graph abstraction can help remove irrelevant information such as textures, resulting in more robust reward functions. We evaluate our approach, GraphIRL, on cross-embodiment learning in X-MAGICAL and learning from human demonstrations for real-robot manipulation. We show significant improvements in robustness to diverse video demonstrations over previous approaches, and even achieve better results than manual reward design on a real robot pushing task. Videos are available at https://sateeshkumar21.github.io/GraphIRL .
翻译:从第三人视频中获取的反强化学习(IRL)研究显示,在消除机器人任务人工奖励设计需求方面,取得了令人鼓舞的成果;然而,大多数先前的工程仍然受到相对有限的视频领域培训的限制。在本文中,我们争辩说,第三人IRL的真正潜力在于增加视频的多样性,以更好地推广。为了从各种视频中学习奖励功能,我们提议在视频上进行图示抽象化,然后在图形空间中进行时间匹配,以衡量任务进展。我们的见解是,一个任务可以通过形成图表的实体互动来描述,而这种图形抽象可以帮助删除不相关的信息,例如纹理,从而产生更强有力的奖励功能。我们评估了我们在X-MAGical交叉录入学习和从人类演示中学习用于真实机器人操纵的方法。我们展示了与以往方法相比,不同视频演示的稳健性显著改善,甚至比真正机器人推力的手工奖励设计取得更好的结果。视频可在https://sateshkumar21.github.io/GraphIRL上查阅。