Large language models hold promise as scientific assistants, yet existing agents either rely solely on algorithm evolution or on deep research in isolation, both of which face critical limitations. Pure algorithm evolution, as in AlphaEvolve, depends only on the internal knowledge of LLMs and quickly plateaus in complex domains, while pure deep research proposes ideas without validation, resulting in unrealistic or unimplementable solutions. We present DeepEvolve, an agent that integrates deep research with algorithm evolution, uniting external knowledge retrieval, cross-file code editing, and systematic debugging under a feedback-driven iterative loop. Each iteration not only proposes new hypotheses but also refines, implements, and tests them, avoiding both shallow improvements and unproductive over-refinements. Across nine benchmarks in chemistry, mathematics, biology, materials, and patents, DeepEvolve consistently improves the initial algorithm, producing executable new algorithms with sustained gains. By bridging the gap between unguided evolution and research without grounding, DeepEvolve provides a reliable framework for advancing scientific algorithm discovery. Our code is available at https://github.com/liugangcode/deepevolve.
翻译:大型语言模型作为科学助手展现出巨大潜力,但现有智能体要么仅依赖算法演化,要么仅依赖深度研究,两者均存在关键局限。纯粹的算法演化(如AlphaEvolve)仅依赖LLM内部知识,在复杂领域会迅速陷入停滞;而纯粹的深度研究仅提出未经验证的构想,导致方案不切实际或无法实现。我们提出DeepEvolve——一种将深度研究与算法演化相融合的智能体,在反馈驱动的迭代循环中整合了外部知识检索、跨文件代码编辑与系统化调试。每次迭代不仅提出新假设,同时对其进行优化、实现与测试,从而避免浅层改进与无效的过度优化。在化学、数学、生物学、材料学及专利领域的九个基准测试中,DeepEvolve持续改进初始算法,生成具备持续增益的可执行新算法。通过弥合无引导演化与缺乏实证的研究之间的鸿沟,DeepEvolve为推进科学算法发现提供了可靠框架。代码已发布于https://github.com/liugangcode/deepevolve。