Reinforcement learning provides a general framework for flexible decision making and control, but requires extensive data collection for each new task that an agent needs to learn. In other machine learning fields, such as natural language processing or computer vision, pre-training on large, previously collected datasets to bootstrap learning for new tasks has emerged as a powerful paradigm to reduce data requirements when learning a new task. In this paper, we ask the following question: how can we enable similarly useful pre-training for RL agents? We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials from a wide range of previously seen tasks, and we show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors. We demonstrate the effectiveness of our approach in challenging robotic manipulation domains involving image observations and sparse reward functions, where our method outperforms prior works by a substantial margin.
翻译:强化学习为灵活决策和控制提供了一个总体框架,但需要为代理人需要学习的每一项新任务收集广泛的数据。 在其他机器学习领域,例如自然语言处理或计算机视觉,对以前收集的大型数据集进行预先培训,以便为新任务进行陷阱学习,这已成为在学习新任务时减少数据要求的有力范例。在本文中,我们提出以下问题:我们如何为RL代理提供同样有用的预培训?我们提出了一种培训前行为方法,该方法可以捕捉在成功试验中观察到的从大量以往所见任务中观察到的复杂的投入-产出关系,我们展示了如何在不阻碍RL代理尝试新行为的能力的情况下,将以前学到的这种经验用于快速学习新任务。我们展示了我们在涉及图像观察和微量奖励功能的机器人操纵领域挑战方法的有效性,在这些领域,我们的方法大大超越了以前的工作。