Recent advancements in decision-making large language model (LLM) agents have demonstrated impressive performance across various benchmarks. However, these state-of-the-art approaches typically necessitate internal model fine-tuning, external model fine-tuning, or policy optimization over a defined state space. Implementing these methods can prove challenging due to the scarcity of high-quality training data or the lack of well-defined state space. Moreover, these agents do not possess certain qualities inherent to human decision-making processes, specifically the ability to learn from mistakes. Self-reflection allows humans to efficiently solve novel problems through a process of trial and error. Building on recent research, we propose Reflexion, an approach that endows an agent with dynamic memory and self-reflection capabilities to enhance its existing reasoning trace and task-specific action choice abilities. To achieve full automation, we introduce a straightforward yet effective heuristic that enables the agent to pinpoint hallucination instances, avoid repetition in action sequences, and, in some environments, construct an internal memory map of the given environment. To assess our approach, we evaluate the agent's ability to complete decision-making tasks in AlfWorld environments and knowledge-intensive, search-based question-and-answer tasks in HotPotQA environments. We observe success rates of 97% and 51%, respectively, and provide a discussion on the emergent property of self-reflection.
翻译:最近决策大型语言模型(LLM)代理的进展已经在各种基准测试中展现出了出色的性能。然而,这些最先进的方法通常需要内部模型微调、外部模型微调或在定义的状态空间上进行策略优化。由于高质量的训练数据稀缺或者缺乏明确定义的状态空间,实施这些方法可能会变得具有挑战性。此外,这些代理不具备人类决策过程中固有的某些品质,特别是从错误中学习的能力。自我反思使人类能够通过试错的过程高效地解决新问题。在最近的研究基础上,我们提出了自我反思(Reflexion)的方法,赋予代理具有动态存储和自我反思能力,以增强其现有的推理迹线和任务特定的行动选择能力。为了实现全自动化,我们引入了一个简单而有效的启发式方法,使代理能够找出幻觉实例、避免行动序列中的重复,并在某些环境中构建给定环境的内部记忆图。为了评估我们的方法,我们评估了代理在 AlfWorld 环境中完成决策任务和在 HotPotQA 环境中完成知识密集型、基于搜索的问答任务的能力。我们观察到分别为 97% 和 51% 的成功率,并对自我反思这个新的属性进行了讨论。