AI agents built on foundation models hold enormous promise. Current practice, however, focuses on a one-task-one-agent approach, which not only falls short of scalability and generality, but also faces practical limitations from black-box autoregressive reasoning, where decisions unfold token by token without explicit simulation or counterfactual evaluation of outcomes. Humans, on the other hand, reason and plan by mentally simulating the consequences of actions within an internal model of the world -- a capability that supports flexible, goal-directed behavior across diverse contexts. Moving towards a more general and powerful AI agent, we introduce SimuRA, a goal-oriented architecture for generalized agentic reasoning. Based on a principled formulation of an optimal agent in any general environment, SimuRA addresses the limitations of black-box autoregressive reasoning by incorporating the world model for planning via simulation. Our prototype world model is implemented using LLMs as a substrate, leveraging the natural language as a discrete, hierarchical representation grounded in concepts for planning, while remaining model-agnostic. On complex web-browsing tasks such as flight search, SimuRA improves the success rate from 0% to 32.2% compared to a representative open-web agent baseline. Across tasks, world-model-based planning achieves up to 124% higher task completion rates than a matched black-box autoregressive baseline, demonstrating the advantages of simulative reasoning. We release ReasonerAgent-Web, a web-browsing agent built on SimuRA, as an open-source research demo.
翻译:基于基础模型构建的AI智能体展现出巨大潜力。然而,当前实践主要采用"单任务单智能体"模式,不仅缺乏可扩展性与通用性,还面临黑盒自回归推理的实际局限——其决策过程以逐令牌生成方式展开,缺乏对结果的显式模拟与反事实评估。相比之下,人类通过在世界内部模型中对行动结果进行心理模拟来实现推理与规划,这种能力支持跨多样情境的灵活目标导向行为。为推进更通用、更强大的AI智能体发展,本文提出SimuRA——一种面向通用智能体推理的目标导向架构。基于对任意通用环境中最优智能体的原理性表述,SimuRA通过整合世界模型实现模拟规划,从而解决黑盒自回归推理的局限性。我们的原型世界模型以LLM为底层实现,利用自然语言作为基于规划概念的离散分层表征,同时保持模型无关性。在航班搜索等复杂网页浏览任务中,相较于代表性开放网络智能体基线,SimuRA将成功率从0%提升至32.2%。跨任务实验表明,基于世界模型的规划相比匹配的黑盒自回归基线可实现最高124%的任务完成率提升,充分验证了模拟推理的优势。我们开源了基于SimuRA构建的网页浏览智能体ReasonerAgent-Web作为研究演示平台。