We consider how to most efficiently leverage teleoperator time to collect data for learning robust image-based value functions and policies for sparse reward robotic tasks. To accomplish this goal, we modify the process of data collection to include more than just successful demonstrations of the desired task. Instead we develop a novel protocol that we call Visual Backtracking Teleoperation (VBT), which deliberately collects a dataset of visually similar failures, recoveries, and successes. VBT data collection is particularly useful for efficiently learning accurate value functions from small datasets of image-based observations. We demonstrate VBT on a real robot to perform continuous control from image observations for the deformable manipulation task of T-shirt grasping. We find that by adjusting the data collection process we improve the quality of both the learned value functions and policies over a variety of baseline methods for data collection. Specifically, we find that offline reinforcement learning on VBT data outperforms standard behavior cloning on successful demonstration data by 13% when both methods are given equal-sized datasets of 60 minutes of data from the real robot.
翻译:我们考虑如何最有效地利用远程操作者时间收集数据,以学习稳健的图像价值功能和稀有奖励机器人任务的政策。为了实现这一目标,我们修改数据收集过程,使之包括不仅仅是成功展示所期望的任务。相反,我们开发了一个新颖的协议,我们称之为视觉回溯跟踪远程操作(VBT),我们称之为视觉回溯跟踪远程操作(VBT),它刻意收集一系列视觉上相似的失败、恢复和成功的数据。VBT数据收集对于从图像观测的小数据集中有效地学习准确价值功能特别有用。我们用一个真正的机器人展示VBT,从图像观测中持续控制图像观测,以完成T恤衫抓捕的变形操作任务。我们发现,通过调整数据收集过程,我们通过调整数据收集方法的学习价值功能和政策的质量,在各种基线数据收集方法上,我们发现,当两种方法获得60分钟的等尺寸数据时,从网上加强VBT数据学习,而从成功模拟数据的标准行为克隆比13 %。