InfCode：通过对抗性迭代精化测试与补丁实现可靠的软件问题解决 (InfCode: Adversarial Iterative Refinement of Tests and Patches for Reliable Software Issue Resolution)

Large language models have advanced software engineering automation, yet resolving real-world software issues remains difficult because it requires repository-level reasoning, accurate diagnostics, and strong verification signals. Existing agent-based and pipeline-based methods often rely on insufficient tests, which can lead to patches that satisfy verification but fail to fix the underlying defect. We present InfCode, an adversarial multi-agent framework for automated repository-level issue resolution. InfCode iteratively refines both tests and patches through adversarial interaction between a Test Patch Generator and a Code Patch Generator, while a Selector agent identifies the most reliable fix. The framework runs inside a containerized environment that supports realistic repository inspection, modification, and validation. Experiments on SWE-bench Lite and SWE-bench Verified using models such as DeepSeek-V3 and Claude 4.5 Sonnet show that InfCode consistently outperforms strong baselines. It achieves 79.4% performance on SWE-bench Verified, establishing a new state-of-the-art. We have released InfCode as an open-source project at https://github.com/Tokfinity/InfCode.

翻译：大型语言模型推动了软件工程自动化的发展，然而解决现实世界中的软件问题仍然具有挑战性，因为这需要仓库级别的推理、精确的诊断以及强有力的验证信号。现有的基于智能体和流水线的方法通常依赖于不充分的测试，这可能导致生成的补丁虽然通过验证，却未能修复根本缺陷。本文提出了InfCode，一种用于自动化仓库级别问题解决的对抗性多智能体框架。InfCode通过测试补丁生成器与代码补丁生成器之间的对抗性交互，迭代地精化测试和补丁，同时由一个选择器智能体识别最可靠的修复方案。该框架运行在一个支持真实仓库检查、修改和验证的容器化环境中。在SWE-bench Lite和SWE-bench Verified数据集上使用DeepSeek-V3和Claude 4.5 Sonnet等模型进行的实验表明，InfCode始终优于强基线方法。它在SWE-bench Verified上达到了79.4%的性能，创造了新的最先进水平。我们已将InfCode作为开源项目发布于https://github.com/Tokfinity/InfCode。