This study focuses on using large language models (LLMs) as a planner for embodied agents that can follow natural language instructions to complete complex tasks in a visually-perceived environment. The high data cost and poor sample efficiency of existing methods hinders the development of versatile agents that are capable of many tasks and can learn new tasks quickly. In this work, we propose a novel method, LLM-Planner, that harnesses the power of large language models to do few-shot planning for embodied agents. We further propose a simple but effective way to enhance LLMs with physical grounding to generate and update plans that are grounded in the current environment. Experiments on the ALFRED dataset show that our method can achieve very competitive few-shot performance: Despite using less than 0.5% of paired training data, LLM-Planner achieves competitive performance with recent baselines that are trained using the full training data. Existing methods can barely complete any task successfully under the same few-shot setting. Our work opens the door for developing versatile and sample-efficient embodied agents that can quickly learn many tasks. Website: https://dki-lab.github.io/LLM-Planner
翻译:本研究聚焦于使用大型语言模型 (LLM) 作为计划器来规划具有视觉感知环境的机器人, 使其能够遵循自然语言指令完成复杂任务。现有方法高昂的数据成本和低效的样本利用率阻碍了多任务机器人的发展和快速学习新任务的能力。在本文中, 我们提出了一种新的方法, LLM-Planner, 运用大型语言模型进行小样本规划。此外, 我们还提出了一种简单有效的方式, 通过物理环境的基础来增强 LLM, 生成并更新基于物理环境的规划。在 ALFRED 数据集上的实验表明, 我们的方法可以取得极具竞争力的小样本性能: 尽管使用少于0.5%的配对训练数据, LLM-Planner 在使用完整训练数据进行训练得到的最新基线的小样本性能方面具有竞争力。现有方法在相同的小样本环境下基本上无法成功完成任何任务。我们的工作为开发多任务的、样本利用率高、能够快速学习多个任务的机器人开启了大门。网站: https://dki-lab.github.io/LLM-Planner