Although deep reinforcement learning has recently been very successful at learning complex behaviors, it requires a tremendous amount of data to learn a task. One of the fundamental reasons causing this limitation lies in the nature of the trial-and-error learning paradigm of reinforcement learning, where the agent communicates with the environment and progresses in the learning only relying on the reward signal. This is implicit and rather insufficient to learn a task well. On the contrary, humans are usually taught new skills via natural language instructions. Utilizing language instructions for robotic motion control to improve the adaptability is a recently emerged topic and challenging. In this paper, we present a meta-RL algorithm that addresses the challenge of learning skills with language instructions in multiple manipulation tasks. On the one hand, our algorithm utilizes the language instructions to shape its interpretation of the task, on the other hand, it still learns to solve task in a trial-and-error process. We evaluate our algorithm on the robotic manipulation benchmark (Meta-World) and it significantly outperforms state-of-the-art methods in terms of training and testing task success rates. Codes are available at \url{https://tumi6robot.wixsite.com/million}.
翻译:虽然深入强化学习最近在学习复杂行为方面非常成功,但它需要大量的数据来学习一项任务。造成这一限制的一个根本原因在于强化学习的试和试学习模式的性质,即代理人仅依靠奖励信号与环境和学习进展进行交流,这是隐含的,也不足以很好地学习任务。相反,人类通常通过自然语言指导来学习新技能。利用机器人运动控制语言指导来改进适应能力是一个最近出现的主题和挑战。在本文中,我们提出了一个元-RL算法,用多种操作任务的语言指导来解决学习技能的挑战。一方面,我们的算法利用语言指示来制定任务解释,另一方面,它仍然学会在试验和操作过程中解决问题。我们评估机器人操纵基准(Meta-World)的算法,它在培训和测试任务成功率方面大大超出了最新技术的方法。代码可在以下网址/urlmillum{http://umixw.coms@m@mill/mill/millas/millot6中找到。