Enabling robots to learn novel visuomotor skills in a data-efficient manner remains an unsolved problem with myriad challenges. A popular paradigm for tackling this problem is through leveraging large unlabeled datasets that have many behaviors in them and then adapting a policy to a specific task using a small amount of task-specific human supervision (i.e. interventions or demonstrations). However, how best to leverage the narrow task-specific supervision and balance it with offline data remains an open question. Our key insight in this work is that task-specific data not only provides new data for an agent to train on but can also inform the type of prior data the agent should use for learning. Concretely, we propose a simple approach that uses a small amount of downstream expert data to selectively query relevant behaviors from an offline, unlabeled dataset (including many sub-optimal behaviors). The agent is then jointly trained on the expert and queried data. We observe that our method learns to query only the relevant transitions to the task, filtering out sub-optimal or task-irrelevant data. By doing so, it is able to learn more effectively from the mix of task-specific and offline data compared to naively mixing the data or only using the task-specific data. Furthermore, we find that our simple querying approach outperforms more complex goal-conditioned methods by 20% across simulated and real robotic manipulation tasks from images. See https://sites.google.com/view/behaviorretrieval for videos and code.
翻译:让机器人以数据高效的方式学习新的视觉动作技能仍然是一个未解决的问题,存在着无数挑战。解决这个问题的一个流行范式是通过利用大量的未标记数据集,这些数据集中有许多行为,然后使用少量特定任务的人类监督(即干预或演示)来调整策略,使其适应特定任务。然而,如何最好地利用狭窄的特定任务监督,以及如何平衡它与离线数据仍然是一个未解决的问题。我们在这项工作中的关键洞察是,任务特定数据不仅为代理人提供了新的训练数据,而且还可以确定代理人学习所需的先前数据类型。具体而言,我们提出了一种简单的方法,利用少量下游专家数据从离线、未标记的数据集中选择相关行为(包括许多次优行为)。然后,代理人联合训练专家和查询数据。我们观察到,我们的方法学会仅查询与任务相关的转换,过滤掉次优或与任务无关的数据。通过这样做,相比于仅混合数据或仅使用任务特定数据,它能够更有效地从任务特定和离线数据的混合中学习。此外,我们发现我们的简单查询方法在模拟和真实的机器人操作任务中比更复杂的以目标为条件的方法表现提高了20%。请查看 https://sites.google.com/view/behaviorretrieval 视频和代码。