LEMON:通过执行指导培训前使用语言进行环境管理 (LEMON: Language-Based Environment Manipulation via Execution-Guided Pre-training)

Language-based environment manipulation requires agents to manipulate the environment following natural language instructions, which is challenging due to the huge space of the environments. To address this challenge, various approaches have been proposed in recent work. Although these approaches work well for their intended environments, they are difficult to generalize across environments. In this work, we propose LEMON, a general framework for language-based environment manipulation tasks. Specifically, we first specify a general approach for language-based environment manipulation tasks, which can deal with various environments using the same generative language model. Then we propose an execution-guided pre-training strategy to inject prior knowledge of environments to the language model with a pure synthetic pre-training corpus. Experimental results on tasks including Alchemy, Scene, Tangrams, ProPara and Recipes demonstrate the effectiveness of LEMON: it achieves new state-of-the-art results on Alchemy, Scene, ProPara, and Recipes, and the execution-guided pre-training strategy brings remarkable improvements on all experimental tasks.

翻译：以语言为基础的环境操纵要求代理商按照自然语言指令操控环境,由于环境空间巨大,这具有挑战性。为了应对这一挑战,在最近的工作中提出了各种办法。虽然这些办法对预期的环境效果良好,但很难在各种环境中加以概括。在这项工作中,我们提议LEMON,这是基于语言的环境操纵任务的一般框架。具体地说,我们首先为基于语言的环境操控任务规定一种一般办法,可以使用相同的基因化语言模式处理各种环境。然后,我们提出一项执行指导的训练前战略,将先前的环境知识注入语言模型,并配有纯合成的训练前材料。关于Alchemy、Scene、Tangrams、ProPara和Repipes等任务的实验结果显示了LEMON:它实现了关于Alchemy、Scene、ProPara和Recipes的新的最新成果,而执行指导的训练前战略使所有实验任务都取得了显著的改进。