DeepAgent：一种具备可扩展工具集的通用推理智能体 (DeepAgent: A General Reasoning Agent with Scalable Toolsets)

Large reasoning models have demonstrated strong problem-solving abilities, yet real-world tasks often require external tools and long-horizon interactions. Existing agent frameworks typically follow predefined workflows, which limit autonomous and global task completion. In this paper, we introduce DeepAgent, an end-to-end deep reasoning agent that performs autonomous thinking, tool discovery, and action execution within a single, coherent reasoning process. To address the challenges of long-horizon interactions, particularly the context length explosion from multiple tool calls and the accumulation of interaction history, we introduce an autonomous memory folding mechanism that compresses past interactions into structured episodic, working, and tool memories, reducing error accumulation while preserving critical information. To teach general-purpose tool use efficiently and stably, we develop an end-to-end reinforcement learning strategy, namely ToolPO, that leverages LLM-simulated APIs and applies tool-call advantage attribution to assign fine-grained credit to the tool invocation tokens. Extensive experiments on eight benchmarks, including general tool-use tasks (ToolBench, API-Bank, TMDB, Spotify, ToolHop) and downstream applications (ALFWorld, WebShop, GAIA, HLE), demonstrate that DeepAgent consistently outperforms baselines across both labeled-tool and open-set tool retrieval scenarios. This work takes a step toward more general and capable agents for real-world applications. The code and demo are available at https://github.com/RUC-NLPIR/DeepAgent.

翻译：大型推理模型已展现出强大的问题解决能力，但现实世界任务通常需要外部工具和长程交互。现有智能体框架通常遵循预定义的工作流程，这限制了自主和全局的任务完成。本文提出DeepAgent，一种端到端的深度推理智能体，其在单一连贯的推理过程中执行自主思考、工具发现和动作执行。为解决长程交互的挑战，特别是多次工具调用导致的上下文长度爆炸和交互历史的累积，我们引入了一种自主记忆折叠机制，将过去的交互压缩为结构化的情景记忆、工作记忆和工具记忆，从而在保留关键信息的同时减少误差累积。为了高效稳定地教授通用工具使用，我们开发了一种端到端的强化学习策略，即ToolPO，该策略利用LLM模拟的API，并应用工具调用优势归因方法，为工具调用令牌分配细粒度的信用。在八个基准测试（包括通用工具使用任务（ToolBench、API-Bank、TMDB、Spotify、ToolHop）和下游应用（ALFWorld、WebShop、GAIA、HLE））上的大量实验表明，DeepAgent在标注工具和开放集工具检索场景中均持续优于基线方法。这项工作向构建更通用、更强大的现实世界应用智能体迈进了一步。代码和演示可在 https://github.com/RUC-NLPIR/DeepAgent 获取。