Embodied AI focuses on the study and development of intelligent systems that possess a physical or virtual embodiment (i.e. robots) and are able to dynamically interact with their environment. Memory and control are the two essential parts of an embodied system and usually require separate frameworks to model each of them. In this paper, we propose a novel and generalizable framework called LLM-Brain: using Large-scale Language Model as a robotic brain to unify egocentric memory and control. The LLM-Brain framework integrates multiple multimodal language models for robotic tasks, utilizing a zero-shot learning approach. All components within LLM-Brain communicate using natural language in closed-loop multi-round dialogues that encompass perception, planning, control, and memory. The core of the system is an embodied LLM to maintain egocentric memory and control the robot. We demonstrate LLM-Brain by examining two downstream tasks: active exploration and embodied question answering. The active exploration tasks require the robot to extensively explore an unknown environment within a limited number of actions. Meanwhile, the embodied question answering tasks necessitate that the robot answers questions based on observations acquired during prior explorations.
翻译:机身AI侧重于研究和开发具有物理或虚拟体现(即机器人)的智能系统,能够与其环境动态交互。记忆和控制是具有体现系统的两个基本部分,通常需要分别用框架来模拟它们。在本文中,我们提出了一个新颖的、可推广的框架,称为LLM-Brain:使用大规模语言模型作为机器人大脑,以统一自我中心记忆和控制。LLM-Brain框架集成了多个多模态语言模型,用于机器人任务,采用零样本学习方法。LLM-Brain中的所有组件通过自然语言进行通信,在封闭式多轮对话中涵盖感知、规划、控制和记忆。系统的核心是一个具有自我中心记忆和控制机器人的体现LLM。我们通过检查两个下游任务来演示LLM-Brain:主动探索和体验问题回答。主动探索任务要求机器人在有限的行动次数内广泛探索未知环境。与此同时,具有体验问题回答的任务要求机器人根据先前探索中获取的观察结果回答问题。