批量探索可缩放机器人强化学习实例 (Batch Exploration with Examples for Scalable Robotic Reinforcement Learning)

Learning from diverse offline datasets is a promising path towards learning general purpose robotic agents. However, a core challenge in this paradigm lies in collecting large amounts of meaningful data, while not depending on a human in the loop for data collection. One way to address this challenge is through task-agnostic exploration, where an agent attempts to explore without a task-specific reward function, and collect data that can be useful for any downstream task. While these approaches have shown some promise in simple domains, they often struggle to explore the relevant regions of the state space in more challenging settings, such as vision based robotic manipulation. This challenge stems from an objective that encourages exploring everything in a potentially vast state space. To mitigate this challenge, we propose to focus exploration on the important parts of the state space using weak human supervision. Concretely, we propose an exploration technique, Batch Exploration with Examples (BEE), that explores relevant regions of the state-space, guided by a modest number of human provided images of important states. These human provided images only need to be collected once at the beginning of data collection and can be collected in a matter of minutes, allowing us to scalably collect diverse datasets, which can then be combined with any batch RL algorithm. We find that BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot, and observe that compared to task-agnostic and weakly-supervised exploration techniques, it (1) interacts more than twice as often with relevant objects, and (2) improves downstream task performance when used in conjunction with offline RL.

翻译：从不同的离线数据集中学习,是学习通用机器人剂的一条有希望的道路。然而,这一范例中的核心挑战在于收集大量有意义的数据,而不是依赖于在数据收集圈圈中的人,而这种模式的核心挑战在于收集大量有意义的数据,而不是依赖于在数据收集圈中的人。解决这一挑战的一个办法是任务不可知的探索,一个代理试图在没有特定任务奖励功能的情况下探索,并收集对任何下游任务有用的数据。虽然这些方法在简单领域显示了一些希望,但它们往往在更具有挑战性的环境中,例如基于视觉的机器人操纵,探索国家空间的相关区域。这个挑战来自鼓励在潜在广阔的空间中探索一切目标。为了减轻这一挑战,我们提议利用薄弱的人类监督,重点探索国家空间的重要部分。具体地说,我们建议一种探索技术,即利用实例进行探索,探索空间的相关区域,以少量重要国家提供的人类图像为指导。这些人类提供的图像只需在数据收集开始时收集一次,就可以在几分钟内收集,从而鼓励在潜在的广度空间空间中探索所有所有物体。我们能够以更具有挑战性的方式收集不同性的工作,然后用一个具有挑战性的任务,我们可以将一个更精确地收集到更具有挑战性的工作,然后用一个更精确的图像,我们能够用一个比一个更精确的、更精确的、更精确的、更精确的、更精确的、更精确地在操作中,我们用一个更精确的、更精确的、更精确的变压的操作一个比一个比一个比一个更精确的、更精确的、更精确的、更精确的、更精确的、更精确的、更精确的游戏。