Multi-object rearrangement is a crucial skill for service robots, and commonsense reasoning is frequently needed in this process. However, achieving commonsense arrangements requires knowledge about objects, which is hard to transfer to robots. Large language models (LLMs) are one potential source of this knowledge, but they do not naively capture information about plausible physical arrangements of the world. We propose LLM-GROP, which uses prompting to extract commonsense knowledge about semantically valid object configurations from an LLM and instantiates them with a task and motion planner in order to generalize to varying scene geometry. LLM-GROP allows us to go from natural-language commands to human-aligned object rearrangement in varied environments. Based on human evaluations, our approach achieves the highest rating while outperforming competitive baselines in terms of success rate while maintaining comparable cumulative action costs. Finally, we demonstrate a practical implementation of LLM-GROP on a mobile manipulator in real-world scenarios. Supplementary materials are available at: https://sites.google.com/view/llm-grop
翻译:多对象重新排列是机器人服务的关键技能,在这一过程中经常需要常识推理。然而,实现常识安排需要了解物体,而这种了解很难转让给机器人。大型语言模型(LLMs)是这种知识的潜在来源之一,但它们并不天真地捕捉到关于世界合理物理安排的信息。我们提议LLM-GROP,利用LLM-GROP迅速从一个LLM中提取关于具有内在有效性的物体配置的常识知识,并用任务和运动规划器将它们立即转换成一个任务和运动规划器,以便推广到不同的场景几何学。LM-GROP允许我们从自然语言指令到不同环境中的人类一致物体重新排列。根据人类评价,我们的方法达到最高评级,同时在成功率方面超过竞争性基线,同时保持可比的累积行动成本。最后,我们展示了在现实世界情景中移动操纵器上实际应用LM-GROP的情况。补充材料见:https://sites.goglegle.com.com/view/llm-grop。</s>