Recent works have shown how the reasoning capabilities of Large Language Models (LLMs) can be applied to domains beyond natural language processing, such as planning and interaction for robots. These embodied problems require an agent to understand many semantic aspects of the world: the repertoire of skills available, how these skills influence the world, and how changes to the world map back to the language. LLMs planning in embodied environments need to consider not just what skills to do, but also how and when to do them - answers that change over time in response to the agent's own choices. In this work, we investigate to what extent LLMs used in such embodied contexts can reason over sources of feedback provided through natural language, without any additional training. We propose that by leveraging environment feedback, LLMs are able to form an inner monologue that allows them to more richly process and plan in robotic control scenarios. We investigate a variety of sources of feedback, such as success detection, scene description, and human interaction. We find that closed-loop language feedback significantly improves high-level instruction completion on three domains, including simulated and real table top rearrangement tasks and long-horizon mobile manipulation tasks in a kitchen environment in the real world.
翻译:最近的工作表明,大型语言模型(LLMS)的推理能力可如何应用于自然语言处理以外的领域,如机器人的规划和互动等。这些包含的问题要求代理人理解世界的许多语义方面:现有技能的集合、这些技能如何影响世界,以及世界地图如何回归到语言。在包含环境中的规划LLMS不仅需要考虑需要做什么技能,还需要考虑如何和何时做这些技能——这些答案随着时间的变化而随着代理人自己的选择而变化。在这项工作中,我们调查在这种包含的环境下使用的LLMS能够在多大程度上超越通过自然语言提供的反馈来源,而无需任何额外培训。我们建议,LLMS通过利用环境反馈,能够形成一个内部独白,使其能够在机器人控制情景中更丰富地进行处理和规划。我们调查了各种反馈来源,例如成功检测、场景描述和人际互动。我们发现,封闭式语言反馈大大改进了三个领域高层次教学的完成情况,包括模拟和真实的桌面后置任务,以及世界厨房长期移动任务。