Although deep reinforcement learning has recently been very successful at learning complex behaviors, it requires tremendous amount of data to learn a task, let alone being able to adapt to new tasks. One of the fundamental reasons causing this limitation lies in the nature of the trial-and-error learning paradigm of reinforcement learning, where the agent communicates with the task and progresses the learning only relying on the reward signal, which is implicit and insufficient to learn a task well. On the contrary, human beings mainly learn new skills via semantic representations or natural language instructions. However, utilizing language instructions towards robotic motion control to improve the adaptability is a newly emerged topic and challenging as well. In this paper, we present an meta-RL algorithm that addresses the challenge of learning skills with language instructions in multiple manipulation tasks. On the one hand, our algorithm utilizes the language instructions to shape its interpretation of the task, on the other hand, it still learns to solve task in a trial-and-error process. We evaluate our algorithm on the robotic manipulation benchmark (Meta-World) and significantly outperforms state-of-the-arts method in terms of training and testing success rates. The code is available at \url{https://tumi6robot.wixsite.com/million}.
翻译:虽然深层强化学习最近在学习复杂行为方面非常成功,但它需要大量的数据来学习一项任务,更不用说能够适应新的任务了。造成这一限制的一个根本原因在于强化学习的试和试学习范式的性质,即代理人与任务交流,并推进学习仅依靠奖励信号,而奖励信号是隐含的,不足以很好地学习任务。相反,人类主要通过语义表达或自然语言指令学习新技能。然而,利用机器人运动控制语言指令来提高适应能力是一个新出现的话题,也是具有挑战性的。在本文中,我们提出了一个元-RL算法,用多种操作任务中的语言指示来应对学习技能的挑战。一方面,我们的算法利用语言指示来塑造对任务的解释,另一方面,它仍然学会在试验-error过程中解决问题。我们在机器人操纵基准(Meta-World)上评估我们的算法,在培训和测试成功率方面大大超出艺术状态的方法。