DeepCode：开放式智能体编程 (DeepCode: Open Agentic Coding)

Recent advances in large language models (LLMs) have given rise to powerful coding agents, making it possible for code assistants to evolve into code engineers. However, existing methods still face significant challenges in achieving high-fidelity document-to-codebase synthesis--such as scientific papers to code--primarily due to a fundamental conflict between information overload and the context bottlenecks of LLMs. In this work, we introduce DeepCode, a fully autonomous framework that fundamentally addresses this challenge through principled information-flow management. By treating repository synthesis as a channel optimization problem, DeepCode seamlessly orchestrates four information operations to maximize task-relevant signals under finite context budgets: source compression via blueprint distillation, structured indexing using stateful code memory, conditional knowledge injection via retrieval-augmented generation, and closed-loop error correction. Extensive evaluations on the PaperBench benchmark demonstrate that DeepCode achieves state-of-the-art performance, decisively outperforming leading commercial agents such as Cursor and Claude Code, and crucially, surpassing PhD-level human experts from top institutes on key reproduction metrics. By systematically transforming paper specifications into production-grade implementations comparable to human expert quality, this work establishes new foundations for autonomous scientific reproduction that can accelerate research evaluation and discovery.

翻译：近年来，大语言模型（LLMs）的进展催生了强大的代码智能体，使得代码助手有望演变为代码工程师。然而，现有方法在实现高保真度的文档到代码库合成（例如从科学论文生成代码）方面仍面临重大挑战，这主要源于信息过载与LLMs上下文容量瓶颈之间的根本性矛盾。本研究提出DeepCode，一种完全自主的框架，通过基于原理的信息流管理从根本上解决这一挑战。通过将代码库合成视为信道优化问题，DeepCode在有限上下文预算下无缝协调四种信息操作以最大化任务相关信号：基于蓝图提炼的源压缩、使用状态化代码记忆的结构化索引、通过检索增强生成的条件知识注入，以及闭环错误修正。在PaperBench基准上的广泛评估表明，DeepCode实现了最先进的性能，显著超越了Cursor和Claude Code等主流商业智能体，并且关键指标上超越了顶尖机构的博士级人类专家。通过系统地将论文规范转化为可媲美人类专家水准的生产级实现，本研究为自主科学复现奠定了新基础，有望加速研究评估与科学发现进程。