The integration of Large Language Models (LLMs) into robotics has unlocked unprecedented capabilities in high-level task planning. However, most current systems operate in an open-loop fashion, where LLMs act as one-shot planners, rendering them brittle and unable to adapt to unforeseen circumstances in dynamic physical environments. To overcome this limitation, this paper introduces the "Think, Act, Learn" (T-A-L) framework, a novel architecture that enables an embodied agent to autonomously learn and refine its policies through continuous interaction. Our framework establishes a closed-loop cycle where an LLM first "thinks" by decomposing high-level commands into actionable plans. The robot then "acts" by executing these plans while gathering rich, multimodal sensory feedback. Critically, the "learn" module processes this feedback to facilitate LLM-driven self-reflection, allowing the agent to perform causal analysis on its failures and generate corrective strategies. These insights are stored in an experiential memory to guide future planning cycles. We demonstrate through extensive experiments in both simulation and the real world that our T-A-L agent significantly outperforms baseline methods, including open-loop LLMs, Behavioral Cloning, and traditional Reinforcement Learning. Our framework achieves over a 97% success rate on complex, long-horizon tasks, converges to a stable policy in an average of just 9 trials, and exhibits remarkable generalization to unseen tasks. This work presents a significant step towards developing more robust, adaptive, and truly autonomous robotic agents.
翻译:将大语言模型(LLMs)集成到机器人学中,为高层任务规划解锁了前所未有的能力。然而,当前大多数系统以开环方式运行,LLMs仅充当一次性规划器,这导致系统脆弱且无法适应动态物理环境中的意外情况。为克服这一局限,本文提出了“思维、行动、学习”(T-A-L)框架,这是一种新颖的架构,能使具身智能体通过持续交互自主学习和优化其策略。我们的框架建立了一个闭环循环:首先,LLM通过将高层指令分解为可执行的计划来“思考”;随后,机器人通过执行这些计划并收集丰富的多模态感官反馈来“行动”;关键的是,“学习”模块处理这些反馈,以促进LLM驱动的自我反思,使智能体能够对其失败进行因果分析并生成纠正策略。这些洞见被存储在一个经验记忆中,以指导未来的规划周期。我们通过大量仿真和真实世界实验证明,我们的T-A-L智能体显著优于基线方法,包括开环LLMs、行为克隆和传统的强化学习。我们的框架在复杂、长视野任务上实现了超过97%的成功率,平均仅需9次试验即可收敛到稳定策略,并对未见任务展现出卓越的泛化能力。这项工作为开发更鲁棒、自适应和真正自主的机器人智能体迈出了重要一步。