A practical approach to robot reinforcement learning is to first collect a large batch of real or simulated robot interaction data, using some data collection policy, and then learn from this data to perform various tasks, using offline learning algorithms. Previous work focused on manually designing the data collection policy, and on tasks where suitable policies can easily be designed, such as random picking policies for collecting data about object grasping. For more complex tasks, however, it may be difficult to find a data collection policy that explores the environment effectively, and produces data that is diverse enough for the downstream task. In this work, we propose that data collection policies should actively explore the environment to collect diverse data. In particular, we develop a simple-yet-effective goal-conditioned reinforcement-learning method that actively focuses data collection on novel observations, thereby collecting a diverse data-set. We evaluate our method on simulated robot manipulation tasks with visual inputs and show that the improved diversity of active data collection leads to significant improvements in the downstream learning tasks.
翻译:机器人强化学习的实用方法是,首先利用一些数据收集政策,收集大量真实或模拟机器人互动数据,然后从这些数据中学习以完成各种任务,使用离线学习算法。先前的工作侧重于手工设计数据收集政策,以及易于设计适当政策的任务,例如随机挑选收集物体捕捉数据的政策。然而,对于更复杂的任务,可能很难找到一个有效探索环境的数据收集政策,并生成对于下游任务来说足够多样化的数据。在这项工作中,我们建议数据收集政策应积极探索环境以收集各种数据。特别是,我们开发一种简单而有效的、有目标的强化学习方法,积极将数据收集重点放在新观察上,从而收集多种数据集。我们用视觉投入来评估我们模拟机器人操纵任务的方法,并表明活动数据收集的改进多样性导致下游学习任务的重大改进。