Agents capable of carrying out general tasks on a computer can improve efficiency and productivity by automating repetitive tasks and assisting in complex problem-solving. Ideally, such agents should be able to solve new computer tasks presented to them through natural language commands. However, previous approaches to this problem require large amounts of expert demonstrations and task-specific reward functions, both of which are impractical for new tasks. In this work, we show that a pre-trained large language model (LLM) agent can execute computer tasks guided by natural language using a simple prompting scheme where the agent recursively criticizes and improves its output (RCI). The RCI approach significantly outperforms existing LLM methods for automating computer tasks and surpasses supervised learning (SL) and reinforcement learning (RL) approaches on the MiniWoB++ benchmark. RCI is competitive with the state-of-the-art SL+RL method, using only a handful of demonstrations per task rather than tens of thousands, and without a task-specific reward function. Furthermore, we demonstrate RCI prompting's effectiveness in enhancing LLMs' reasoning abilities on a suite of natural language reasoning tasks, outperforming chain of thought (CoT) prompting. We find that RCI combined with CoT performs better than either separately.
翻译:代理人能够在计算机上执行一般任务,可以通过自动化重复任务和协助复杂问题的解决来提高效率和生产力。理想情况下,这种代理人应该能够通过自然语言命令解决它们提出的新的计算机任务。然而,解决这个问题的以前方法需要大量的专家演示和任务特定的奖励函数,这两者都对于新任务来说是不切实际的。在这项工作中,我们展示了一个预先训练的大型语言模型(LLM)代理人,使用一个简单的提示方案,即代理人递归地批评和改进其输出(RCI),可以执行由自然语言指导的计算机任务。RCI方法在自动化计算机任务方面显著优于现有的LLM方法,并且在MiniWoB ++基准测试上超过了监督学习(SL)和强化学习(RL)方法。RCI与SL+RL方法相比,仅使用每个任务的一些演示,而不是成千上万个,并且没有任务特定的奖励函数。此外,我们展示了RCI提示方案在增强LLM推理能力方面的有效性,并在一组自然语言推理任务的套件上优于Chain of thought(CoT)提示。我们发现,RCI与CoT的组合比任何一种方法都要好。