通义深度研究技术报告 (Tongyi DeepResearch Technical Report)

Tongyi DeepResearch Team,Baixuan Li,Bo Zhang,Dingchu Zhang,Fei Huang,Guangyu Li,Guoxin Chen,Huifeng Yin,Jialong Wu,Jingren Zhou,Kuan Li,Liangcai Su,Litu Ou,Liwen Zhang,Pengjun Xie,Rui Ye,Wenbiao Yin,Xinmiao Yu,Xinyu Wang,Xixi Wu,Xuanzhong Chen,Yida Zhao,Zhen Zhang,Zhengwei Tao,Zhongwang Zhang,Zile Qiao,Chenxi Wang,Donglei Yu,Gang Fu,Haiyang Shen,Jiayin Yang,Jun Lin,Junkai Zhang,Kui Zeng,Li Yang,Hailong Yin,Maojia Song,Ming Yan,Peng Xia,Qian Xiao,Rui Min,Ruixue Ding,Runnan Fang,Shaowei Chen,Shen Huang,Shihang Wang,Shihao Cai,Weizhou Shen,Xiaobin Wang,Xin Guan,Xinyu Geng,Yingcheng Shi,Yuning Wu,Zhuo Chen,Zijian Li,Yong Jiang

from arxiv, https://tongyi-agent.github.io/blog

We present Tongyi DeepResearch, an agentic large language model, which is specifically designed for long-horizon, deep information-seeking research tasks. To incentivize autonomous deep research agency, Tongyi DeepResearch is developed through an end-to-end training framework that combines agentic mid-training and agentic post-training, enabling scalable reasoning and information seeking across complex tasks. We design a highly scalable data synthesis pipeline that is fully automatic, without relying on costly human annotation, and empowers all training stages. By constructing customized environments for each stage, our system enables stable and consistent interactions throughout. Tongyi DeepResearch, featuring 30.5 billion total parameters, with only 3.3 billion activated per token, achieves state-of-the-art performance across a range of agentic deep research benchmarks, including Humanity's Last Exam, BrowseComp, BrowseComp-ZH, WebWalkerQA, xbench-DeepSearch, FRAMES and xbench-DeepSearch-2510. We open-source the model, framework, and complete solutions to empower the community.

翻译：我们提出通义深度研究，一种面向长周期、深度信息检索研究任务的智能体大型语言模型。为激励自主深度研究能力，通义深度研究通过结合智能体中期训练与智能体后期训练的端到端训练框架开发，实现了跨复杂任务的可扩展推理与信息检索。我们设计了一个高度可扩展、完全自动化且不依赖昂贵人工标注的数据合成流程，该流程支撑所有训练阶段。通过为每个阶段构建定制化环境，我们的系统确保了全流程稳定且一致的交互。通义深度研究模型总参数量为305亿，每令牌仅激活33亿参数，在一系列智能体深度研究基准测试中取得最先进性能，包括Humanity's Last Exam、BrowseComp、BrowseComp-ZH、WebWalkerQA、xbench-DeepSearch、FRAMES及xbench-DeepSearch-2510。我们开源模型、框架及完整解决方案，以赋能研究社区。