Reinforcement learning (RL) algorithms hold the promise of enabling autonomous skill acquisition for robotic systems. However, in practice, real-world robotic RL typically requires time consuming data collection and frequent human intervention to reset the environment. Moreover, robotic policies learned with RL often fail when deployed beyond the carefully controlled setting in which they were learned. In this work, we study how these challenges can all be tackled by effective utilization of diverse offline datasets collected from previously seen tasks. When faced with a new task, our system adapts previously learned skills to quickly learn to both perform the new task and return the environment to an initial state, effectively performing its own environment reset. Our empirical results demonstrate that incorporating prior data into robotic reinforcement learning enables autonomous learning, substantially improves sample-efficiency of learning, and enables better generalization.
翻译:强化学习(RL)算法为机器人系统提供了自主获取自主技能的希望,但实际上,现实世界机器人RL通常需要花费时间收集数据和频繁的人类干预才能重设环境。此外,在经过仔细控制的学习环境之外,与RL一起学习的机器人政策往往会失败。在这项工作中,我们研究如何通过有效利用从以往所看到的任务中收集的各种离线数据集来应对这些挑战。在面临新任务时,我们的系统会调整以前学到的技能,以便迅速学会既执行新任务又将环境恢复到初始状态,有效地进行自己的环境重置。我们的实证结果表明,将先前的数据纳入机器人强化学习可以自主学习,大大提高学习的抽样效率,并更好地普及。