Recent advances in large language models have enabled AI systems to achieve expert-level performance on domain-specific scientific tasks, yet these systems remain narrow and handcrafted. We introduce SciAgent, a unified multi-agent system designed for generalistic scientific reasoning-the ability to adapt reasoning strategies across disciplines and difficulty levels. SciAgent organizes problem solving as a hierarchical process: a Coordinator Agent interprets each problem's domain and complexity, dynamically orchestrating specialized Worker Systems, each composed of interacting reasoning Sub-agents for symbolic deduction, conceptual modeling, numerical computation, and verification. These agents collaboratively assemble and refine reasoning pipelines tailored to each task. Across mathematics and physics Olympiads (IMO, IMC, IPhO, CPhO), SciAgent consistently attains or surpasses human gold-medalist performance, demonstrating both domain generality and reasoning adaptability. Additionally, SciAgent has been tested on the International Chemistry Olympiad (IChO) and selected problems from the Humanity's Last Exam (HLE) benchmark, further confirming the system's ability to generalize across diverse scientific domains. This work establishes SciAgent as a concrete step toward generalistic scientific intelligence-AI systems capable of coherent, cross-disciplinary reasoning at expert levels.
翻译:近年来,大型语言模型的进展使得人工智能系统能够在特定领域的科学任务中达到专家级性能,但这些系统仍局限于狭窄领域且依赖人工定制。本文提出SciAgent,一种为通用科学推理——即跨学科、跨难度自适应调整推理策略的能力——而设计的统一多智能体系统。SciAgent将问题求解组织为层次化过程:协调器智能体解析每个问题的领域与复杂度,动态编排由交互式推理子智能体(负责符号推演、概念建模、数值计算与验证)组成的专用工作系统。这些智能体协同构建并优化针对每项任务定制的推理流程。在数学与物理奥林匹克竞赛(IMO、IMC、IPhO、CPhO)中,SciAgent持续达到或超越人类金牌得主的表现,同时展现了领域通用性与推理适应性。此外,SciAgent已在国际化学奥林匹克(IChO)及“人类终极考试”(HLE)基准的部分问题上进行测试,进一步证实了该系统跨多样科学领域的泛化能力。本工作标志着SciAgent朝着通用科学智能——即具备连贯、跨学科专家级推理能力的人工智能系统——迈出了坚实一步。