While large language models (LLMs) have demonstrated impressive capabilities across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action plan generation) have primarily been studied as separate topics. In this paper, we explore the use of LLMs to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between the two: reasoning traces help the model induce, track, and update action plans as well as handle exceptions, while actions allow it to interface with external sources, such as knowledge bases or environments, to gather additional information. We apply our approach, named ReAct, to a diverse set of language and decision making tasks and demonstrate its effectiveness over state-of-the-art baselines, as well as improved human interpretability and trustworthiness over methods without reasoning or acting components. Concretely, on question answering (HotpotQA) and fact verification (Fever), ReAct overcomes issues of hallucination and error propagation prevalent in chain-of-thought reasoning by interacting with a simple Wikipedia API, and generates human-like task-solving trajectories that are more interpretable than baselines without reasoning traces. On two interactive decision making benchmarks (ALFWorld and WebShop), ReAct outperforms imitation and reinforcement learning methods by an absolute success rate of 34% and 10% respectively, while being prompted with only one or two in-context examples. Project site with code: https://react-lm.github.io
翻译:虽然大型语言模型(LLMS)在语言理解和互动决策方面表现出了令人印象深刻的跨任务能力,但在语言理解和互动决策方面表现出了令人印象深刻的能力,但其推理能力(例如思维链促动)和行为能力(例如行动计划生成)主要作为单独专题进行研究。在本文件中,我们探索使用LLMS来产生推理痕迹和具体任务行动,从而在不推理或不作为组成部分的情况下产生更大的协同效应:推理痕迹有助于模型的诱导、跟踪和更新行动计划以及处理例外情况,而行动则使其能够与外部来源(例如知识库或环境)进行互动,以收集更多的信息。我们采用我们称为ReAct(ReAAAA)的多种方法,以不同的语言和决策制定任务,并展示其相对于最新基准的有效性,以及人对方法的可解释性和可信任性,而无需推理或演化。具体地说,关于回答问题(HotpotaQA)和事实核查(FEver),ReAPLCS 仅通过与简单的 VIP AP AP AL ARI ARVI 和类似任务推算法的两种推算方法,而无需再解释。