Reinforcement learning has been successful in many tasks ranging from robotic control, games, energy management etc. In complex real world environments with sparse rewards and long task horizons, sample efficiency is still a major challenge. Most complex tasks can be easily decomposed into high-level planning and low level control. Therefore, it is important to enable agents to leverage the hierarchical structure and decompose bigger tasks into multiple smaller sub-tasks. We introduce an approach where we use language to specify sub-tasks and a high-level planner issues language commands to a low level controller. The low-level controller executes the sub-tasks based on the language commands. Our experiments show that this method is able to solve complex long horizon planning tasks with limited human supervision. Using language has added benefit of interpretability and ability for expert humans to take over the high-level planning task and provide language commands if necessary.
翻译:强化学习在从机器人控制、游戏、能源管理等许多任务中取得了成功。在复杂的现实世界环境中,回报微薄,任务视野长,抽样效率仍然是一个重大挑战。大多数复杂的任务很容易分解成高级规划和低水平控制。因此,重要的是使代理商能够利用等级结构,将更大的任务分解成多个较小的子任务。我们采用了一种方法,我们用语言指定子任务和高层次规划员问题语言指令,到低级别控制员。低级别控制员执行基于语言指令的子任务。我们的实验表明,这种方法能够在有限的人监督下解决复杂的长期规划任务。使用语言增加了翻译能力,使专家人类能够接手高级规划任务,并在必要时提供语言指令。