Robot learning methods have the potential for widespread generalization across tasks, environments, and objects. However, these methods require large diverse datasets that are expensive to collect in real-world robotics settings. For robot learning to generalize, we must be able to leverage sources of data or priors beyond the robot's own experience. In this work, we posit that image-text generative models, which are pre-trained on large corpora of web-scraped data, can serve as such a data source. We show that despite these generative models being trained on largely non-robotics data, they can serve as effective ways to impart priors into the process of robot learning in a way that enables widespread generalization. In particular, we show how pre-trained generative models can serve as effective tools for semantically meaningful data augmentation. By leveraging these pre-trained models for generating appropriate "semantic" data augmentations, we propose a system GenAug that is able to significantly improve policy generalization. We apply GenAug to tabletop manipulation tasks, showing the ability to re-target behavior to novel scenarios, while only requiring marginal amounts of real-world data. We demonstrate the efficacy of this system on a number of object manipulation problems in the real world, showing a 40% improvement in generalization to novel scenes and objects.
翻译:机器人学习方法具有在任务、环境和对象之间广泛普及的潜力。 然而, 这些方法需要大量不同的数据集, 而在现实世界机器人设置中收集费用昂贵。 机器人学习要普及, 我们必须能够利用数据来源或前科, 超越机器人自身的经验。 在这项工作中, 我们假设, 图像- 文本基因化模型, 已经预先培训了网络剪裁数据的巨大组合, 可以作为这样的数据源 。 我们显示, 尽管这些基因化模型基本上是用非机器人数据来训练的, 但它们可以作为有效的方法, 向机器人学习过程传授前科, 从而能够推广广泛的普及。 特别是, 我们展示, 预先训练的基因化模型如何成为具有内涵意义的数据增强的有效工具 。 我们利用这些经过预先训练的模型来产生适当的“ 精密” 数据增强功能, 我们提议一个能够显著改进政策的概括化的系统。 我们应用 GenAug 来桌面操作任务, 显示重新定位对象的能力, 从而使得机器人学习过程能够实现广泛化。 我们只需在现实世界数据中显示边际的变现的系统, 就能展示这个变现的系统。