Recent large language models (LLMs) have demonstrated remarkable performance on a variety of natural language processing (NLP) tasks, leading to intense excitement about their applicability across various domains. Unfortunately, recent work has also shown that LLMs are unable to perform accurate reasoning nor solve planning problems, which may limit their usefulness for robotics-related tasks. In this work, our central question is whether LLMs are able to translate goals specified in natural language to a structured planning language. If so, LLM can act as a natural interface between the planner and human users; the translated goal can be handed to domain-independent AI planners that are very effective at planning. Our empirical results on GPT 3.5 variants show that LLMs are much better suited towards translation rather than planning. We find that LLMs are able to leverage commonsense knowledge and reasoning to furnish missing details from under-specified goals (as is often the case in natural language). However, our experiments also reveal that LLMs can fail to generate goals in tasks that involve numerical or physical (e.g., spatial) reasoning, and that LLMs are sensitive to the prompts used. As such, these models are promising for translation to structured planning languages, but care should be taken in their use.
翻译:最近大型语言模型(LLMS)在各种自然语言处理(NLP)任务方面表现出了显著的成绩,导致人们对其在各个领域的适用性产生强烈的兴奋。不幸的是,最近的工作还表明,LLMS无法进行准确的推理或解决规划问题,这可能会限制其对机器人相关任务的作用。在这项工作中,我们的核心问题是LLMS是否能够将自然语言指定的目标转化为结构化的规划语言。如果能够,LLMS可以作为规划者和人类用户之间的天然界面;翻译的目标可以交给在规划方面非常有效的独立的AI规划者。我们关于GPT3.5变式的经验结果表明,LMS更适合翻译而不是规划。我们发现LLMS能够利用常识学知识和推理来提供未充分设定的目标中缺失的细节(自然语言通常就是这种情况 ) 。然而,我们的实验还表明,LMSMs无法在涉及数字或物理(例如空间)推理的任务中产生目标,而LMMs对使用的灵敏捷性十分敏感。我们发现,这些模型有望在结构化的语文中使用。