We consider the problem of learning useful robotic skills from previously collected offline data without access to manually specified rewards or additional online exploration, a setting that is becoming increasingly important for scaling robot learning by reusing past robotic data. In particular, we propose the objective of learning a functional understanding of the environment by learning to reach any goal state in a given dataset. We employ goal-conditioned Q-learning with hindsight relabeling and develop several techniques that enable training in a particularly challenging offline setting. We find that our method can operate on high-dimensional camera images and learn a variety of skills on real robots that generalize to previously unseen scenes and objects. We also show that our method can learn to reach long-horizon goals across multiple episodes through goal chaining, and learn rich representations that can help with downstream tasks through pre-training or auxiliary objectives. The videos of our experiments can be found at https://actionable-models.github.io
翻译:我们考虑了从先前收集的离线数据中学习有用的机器人技能,而没有手动指定的奖赏或额外的在线探索的问题,这种环境对于通过重复使用过去的机器人数据来扩大机器人的学习越来越重要。我们特别建议通过学习在给定的数据集中达到任何目标状态来学习对环境的实用理解。我们采用目标限制的Q学习方法,以事后视力重贴标签,并开发一些技术,以便能够在特别具有挑战性的离线设置中进行培训。我们发现,我们的方法可以使用高维相机图像,并学习各种实际机器人的技能,这些技能可以普及到先前看不见的场景和物体。我们还表明,我们的方法可以通过目标链化,学习如何在多个场景中达到长视距目标,并学习丰富的表现,通过培训前或辅助目标,可以帮助下游任务。我们实验的视频可以在 https://actionable-models.github.io上找到。