自我反思: 具有动态记忆和自我反思能力的自主代理 (Reflexion: an autonomous agent with dynamic memory and self-reflection) - 专知论文

会员服务 ·

0

Agent · 状态空间 · 回合 · MoDELS · HotpotQA ·

2023 年 3 月 20 日

Reflexion: an autonomous agent with dynamic memory and self-reflection

翻译：自我反思: 具有动态记忆和自我反思能力的自主代理

Noah Shinn,Beck Labash,Ashwin Gopinath

Recent advancements in decision-making large language model (LLM) agents have demonstrated impressive performance across various benchmarks. However, these state-of-the-art approaches typically necessitate internal model fine-tuning, external model fine-tuning, or policy optimization over a defined state space. Implementing these methods can prove challenging due to the scarcity of high-quality training data or the lack of well-defined state space. Moreover, these agents do not possess certain qualities inherent to human decision-making processes, specifically the ability to learn from mistakes. Self-reflection allows humans to efficiently solve novel problems through a process of trial and error. Building on recent research, we propose Reflexion, an approach that endows an agent with dynamic memory and self-reflection capabilities to enhance its existing reasoning trace and task-specific action choice abilities. To achieve full automation, we introduce a straightforward yet effective heuristic that enables the agent to pinpoint hallucination instances, avoid repetition in action sequences, and, in some environments, construct an internal memory map of the given environment. To assess our approach, we evaluate the agent's ability to complete decision-making tasks in AlfWorld environments and knowledge-intensive, search-based question-and-answer tasks in HotPotQA environments. We observe success rates of 97% and 51%, respectively, and provide a discussion on the emergent property of self-reflection.

翻译：最近决策大型语言模型（LLM）代理的进展已经在各种基准测试中展现出了出色的性能。然而，这些最先进的方法通常需要内部模型微调、外部模型微调或在定义的状态空间上进行策略优化。由于高质量的训练数据稀缺或者缺乏明确定义的状态空间，实施这些方法可能会变得具有挑战性。此外，这些代理不具备人类决策过程中固有的某些品质，特别是从错误中学习的能力。自我反思使人类能够通过试错的过程高效地解决新问题。在最近的研究基础上，我们提出了自我反思（Reflexion）的方法，赋予代理具有动态存储和自我反思能力，以增强其现有的推理迹线和任务特定的行动选择能力。为了实现全自动化，我们引入了一个简单而有效的启发式方法，使代理能够找出幻觉实例、避免行动序列中的重复，并在某些环境中构建给定环境的内部记忆图。为了评估我们的方法，我们评估了代理在 AlfWorld 环境中完成决策任务和在 HotPotQA 环境中完成知识密集型、基于搜索的问答任务的能力。我们观察到分别为 97% 和 51% 的成功率，并对自我反思这个新的属性进行了讨论。

0

相关内容

Agent

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【开放新书】可验证深度学习，91页pdf阐述Deep Learning的鲁棒性，提升安全可靠性

【开放新书】可验证深度学习，91页pdf阐述Deep Learning的鲁棒性，提升安全可靠性

专知会员服务

59+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

自动驾驶汽车的协调:分类和调查综述（Coordination of Autonomous Vehicles: Taxonomy and Survey），附31页pdf

自动驾驶汽车的协调:分类和调查综述（Coordination of Autonomous Vehicles: Taxonomy and Survey），附31页pdf

专知会员服务

14+阅读 · 2020年1月9日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

强化学习大牛Sergey Levine新作：三个大模型教会机器人认路

强化学习大牛Sergey Levine新作：三个大模型教会机器人认路

机器之心

2+阅读 · 2022年7月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

使用强化学习训练机械臂完成人类任务

使用强化学习训练机械臂完成人类任务

AI研习社

13+阅读 · 2019年3月23日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

笔记 | Deep active learning for named entity recognition

笔记 | Deep active learning for named entity recognition

黑龙江大学自然语言处理实验室

24+阅读 · 2018年5月27日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

微装配智能学习与控制策略研究

国家自然科学基金

0+阅读 · 2013年12月31日

膜蒸馏生物反应器中膜的生物污染形成机理与控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

耦合布尔网络同步及其牵制控制策略的研究

国家自然科学基金

0+阅读 · 2013年12月31日

不确定环境下基于HTN的应急任务规划方法研究

国家自然科学基金

15+阅读 · 2012年12月31日

肿瘤预定位策略用于肝癌的PET显像研究

国家自然科学基金

0+阅读 · 2012年12月31日

中国通胀预期形成、前瞻性时变货币政策规则与收敛速度：基于适应性学习行为的实证研究与模拟

国家自然科学基金

0+阅读 · 2012年12月31日

不确定环境下集装箱码头物流运作能力仿真建模与动态评估

国家自然科学基金

0+阅读 · 2011年12月31日

虚拟边界新改进方法研究柱群涡致振动的被动控制问题

国家自然科学基金

0+阅读 · 2009年12月31日

应急任务生成的决策机制与管理支持方法研究

国家自然科学基金

4+阅读 · 2009年12月31日

An Object SLAM Framework for Association, Mapping, and High-Level Tasks

Arxiv

0+阅读 · 2023年5月12日

Motion Planning for Autonomous Driving: The State of the Art and Future Perspectives

Arxiv

0+阅读 · 2023年5月10日

Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey

Arxiv

17+阅读 · 2022年5月10日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

Building Intelligent Autonomous Navigation Agents

Arxiv

24+阅读 · 2021年6月25日

CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Arxiv

11+阅读 · 2021年2月18日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

VIP会员

文章信息

相关主题

相关VIP内容

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【开放新书】可验证深度学习，91页pdf阐述Deep Learning的鲁棒性，提升安全可靠性

【开放新书】可验证深度学习，91页pdf阐述Deep Learning的鲁棒性，提升安全可靠性

专知会员服务

59+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

自动驾驶汽车的协调:分类和调查综述（Coordination of Autonomous Vehicles: Taxonomy and Survey），附31页pdf

自动驾驶汽车的协调:分类和调查综述（Coordination of Autonomous Vehicles: Taxonomy and Survey），附31页pdf

专知会员服务

14+阅读 · 2020年1月9日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新质生成式AI赋能产业变革的实践与路径

用于多模态大模型的离散标记化：全面综述

Nature综述：金融网络中的物理学

【CMU博士论文】通信高效且差分隐私的优化方法

相关资讯

强化学习大牛Sergey Levine新作：三个大模型教会机器人认路

强化学习大牛Sergey Levine新作：三个大模型教会机器人认路

机器之心

2+阅读 · 2022年7月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

使用强化学习训练机械臂完成人类任务

使用强化学习训练机械臂完成人类任务

AI研习社

13+阅读 · 2019年3月23日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

笔记 | Deep active learning for named entity recognition

笔记 | Deep active learning for named entity recognition

黑龙江大学自然语言处理实验室

24+阅读 · 2018年5月27日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

相关论文

An Object SLAM Framework for Association, Mapping, and High-Level Tasks

Arxiv

0+阅读 · 2023年5月12日

Motion Planning for Autonomous Driving: The State of the Art and Future Perspectives

Arxiv

0+阅读 · 2023年5月10日

Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey

Arxiv

17+阅读 · 2022年5月10日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

Building Intelligent Autonomous Navigation Agents

Arxiv

24+阅读 · 2021年6月25日

CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Arxiv

11+阅读 · 2021年2月18日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

微装配智能学习与控制策略研究

国家自然科学基金

0+阅读 · 2013年12月31日

膜蒸馏生物反应器中膜的生物污染形成机理与控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

耦合布尔网络同步及其牵制控制策略的研究

国家自然科学基金

0+阅读 · 2013年12月31日

不确定环境下基于HTN的应急任务规划方法研究

国家自然科学基金

15+阅读 · 2012年12月31日

肿瘤预定位策略用于肝癌的PET显像研究

国家自然科学基金

0+阅读 · 2012年12月31日

中国通胀预期形成、前瞻性时变货币政策规则与收敛速度：基于适应性学习行为的实证研究与模拟

国家自然科学基金

0+阅读 · 2012年12月31日

不确定环境下集装箱码头物流运作能力仿真建模与动态评估

国家自然科学基金

0+阅读 · 2011年12月31日

虚拟边界新改进方法研究柱群涡致振动的被动控制问题

国家自然科学基金

0+阅读 · 2009年12月31日

应急任务生成的决策机制与管理支持方法研究

国家自然科学基金

4+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员