When humans conceive how to perform a particular task, they do so hierarchically: splitting higher-level tasks into smaller sub-tasks. However, in the literature on natural language (NL) command of situated agents, most works have treated the procedures to be executed as flat sequences of simple actions, or any hierarchies of procedures have been shallow at best. In this paper, we propose a formalism of procedures as programs, a powerful yet intuitive method of representing hierarchical procedural knowledge for agent command and control. We further propose a modeling paradigm of hierarchical modular networks, which consist of a planner and reactors that convert NL intents to predictions of executable programs and probe the environment for information necessary to complete the program execution. We instantiate this framework on the IQA and ALFRED datasets for NL instruction following. Our model outperforms reactive baselines by a large margin on both datasets. We also demonstrate that our framework is more data-efficient, and that it allows for fast iterative development.
翻译:当人类设想如何执行某项特定任务时,他们按等级行事:将更高层次的任务分为较小的子任务。然而,在自然语言(NL)指令文献中,大多数作品将执行的程序视为简单行动的平坦序列,或程序的任何等级最浅。在本文中,我们提议程序的形式主义,作为程序,一种强有力的但直观的方法,代表代理人指挥和控制的等级程序知识。我们进一步提议了等级模块网络的模范模式,其中包括将NL意图转换为可执行程序的预测并探测环境以获得完成程序执行所需的信息。我们在IQA和ALFRED数据集上对NL指令进行回调。我们的模型在两个数据集上大大超越了反应基线。我们还表明,我们的框架数据效率更高,可以进行快速迭接开发。