We consider the problem of learning useful robotic skills from previously collected offline data without access to manually specified rewards or additional online exploration, a setting that is becoming increasingly important for scaling robot learning by reusing past robotic data. In particular, we propose the objective of learning a functional understanding of the environment by learning to reach any goal state in a given dataset. We employ goal-conditioned Q-learning with hindsight relabeling and develop several techniques that enable training in a particularly challenging offline setting. We find that our method can operate on high-dimensional camera images and learn a variety of skills on real robots that generalize to previously unseen scenes and objects. We also show that our method can learn to reach long-horizon goals across multiple episodes, and learn rich representations that can help with downstream tasks through pre-training or auxiliary objectives. The videos of our experiments can be found at https://actionable-models.github.io
翻译:我们考虑了从先前收集的离线数据中学习有用的机器人技能,而没有手动指定的奖励或额外的在线探索,这一问题正在变得日益重要,通过重复使用过去的机器人数据来扩大机器人的学习规模。特别是,我们建议通过学习在给定数据集中达到任何目标状态来学习对环境的实用理解。我们采用目标限制的Q学习方法,后视重贴标签,并开发一些技术,以便能够在特别具有挑战性的离线设置中进行培训。我们发现,我们的方法可以使用高维相机图像,并学习各种实际机器人的技能,这些技能可以普及到以前看不见的场景和物体。我们还表明,我们的方法可以学习如何在多个场景中达到长视距目标,学习丰富的演示,通过培训前或辅助目标来帮助下游任务。我们的实验视频可以在https://actionable-models.githubio上找到。