Reinforcement learning (RL) algorithms hold the promise of enabling autonomous skill acquisition for robotic systems. However, in practice, real-world robotic RL typically requires time consuming data collection and frequent human intervention to reset the environment. Moreover, robotic policies learned with RL often fail when deployed beyond the carefully controlled setting in which they were learned. In this work, we study how these challenges can all be tackled by effective utilization of diverse offline datasets collected from previously seen tasks. When faced with a new task, our system adapts previously learned skills to quickly learn to both perform the new task and return the environment to an initial state, effectively performing its own environment reset. Our empirical results demonstrate that incorporating prior data into robotic reinforcement learning enables autonomous learning, substantially improves sample-efficiency of learning, and enables better generalization. Project website: https://sites.google.com/view/ariel-berkeley/
翻译:强化学习(RL)算法有望为机器人系统提供自主获取技能的能力,但实际上,现实世界机器人RL通常需要花费时间收集数据和频繁的人类干预才能重设环境。此外,与RL一起学习的机器人政策在部署时,如果超出其所学的仔细控制的环境,往往会失败。在这项工作中,我们研究如何通过有效利用从以往所看到的任务中收集的各种离线数据集来应对这些挑战。在面临新任务时,我们的系统调整了以前学到的技能,以便迅速学会既执行新任务又将环境恢复到初始状态,有效地进行自己的环境重新设定。我们的经验结果表明,将先前的数据纳入机器人强化学习,能够自主学习,大大提高学习的样本效率,并更好地概括化。项目网站:https://sites.gogle.com/view/ariel-berkeley/https://sites.gle.gle.com/view/ariel-berkeley/。