Large Language Models (LLMs) have emerged as powerful tools for accelerating scientific discovery, yet their static knowledge and hallucination issues hinder autonomous research applications. Recent advances integrate LLMs into agentic frameworks, enabling retrieval, reasoning, and tool use for complex scientific workflows. Here, we present a domain-specialized agent designed for reliable automation of first-principles materials computations. By embedding domain expertise, the agent ensures physically coherent multi-step workflows and consistently selects convergent, well-posed parameters, thereby enabling reliable end-to-end computational execution. A new benchmark of diverse computational tasks demonstrates that our system significantly outperforms standalone LLMs in both accuracy and robustness. This work establishes a verifiable foundation for autonomous computational experimentation and represents a key step toward fully automated scientific discovery.
翻译:大型语言模型(LLM)已成为加速科学发现的有力工具,但其静态知识与幻觉问题阻碍了其在自主研究中的应用。近期研究进展将LLM集成至主体化框架中,使其能够通过检索、推理与工具调用执行复杂的科学工作流。本文提出一种面向第一性原理材料计算可靠自动化而设计的领域专用智能体。通过嵌入领域专业知识,该智能体确保物理一致的多步骤工作流,并持续选择收敛且适定的计算参数,从而实现可靠的端到端计算执行。基于多样化计算任务构建的新基准测试表明,我们的系统在准确性与鲁棒性上均显著优于独立LLM。本研究为自主计算实验建立了可验证的基础,标志着向全自动化科学发现迈出了关键一步。