Vibe Reasoning：激发前沿AI数学能力——以IMO 2025第6题为例 (Vibe Reasoning: Eliciting Frontier AI Mathematical Capabilities -- A Case Study on IMO 2025 Problem 6)

We introduce Vibe Reasoning, a human-AI collaborative paradigm for solving complex mathematical problems. Our key insight is that frontier AI models already possess the knowledge required to solve challenging problems -- they simply do not know how, what, or when to apply it. Vibe Reasoning transforms AI's latent potential into manifested capability through generic meta-prompts, agentic grounding, and model orchestration. We demonstrate this paradigm through IMO 2025 Problem 6, a combinatorial optimization problem where autonomous AI systems publicly reported failures. Our solution combined GPT-5's exploratory capabilities with Gemini 3 Pro's proof strengths, leveraging agentic workflows with Python code execution and file-based memory, to derive both the correct answer (2112) and a rigorous mathematical proof. Through iterative refinement across multiple attempts, we discovered the necessity of agentic grounding and model orchestration, while human prompts evolved from problem-specific hints to generic, transferable meta-prompts. We analyze why capable AI fails autonomously, how each component addresses specific failure modes, and extract principles for effective vibe reasoning. Our findings suggest that lightweight human guidance can unlock frontier models' mathematical reasoning potential. This is ongoing work; we are developing automated frameworks and conducting broader evaluations to further validate Vibe Reasoning's generality and effectiveness.

翻译：本文提出Vibe Reasoning，一种用于解决复杂数学问题的人机协作范式。我们的核心见解是：前沿AI模型已具备解决挑战性问题所需的知识——它们只是不知道如何、何时或应用哪些知识。Vibe Reasoning通过通用元提示、具身化基础与模型编排，将AI的潜在能力转化为显性能力。我们以IMO 2025第6题（一个组合优化问题，此前自主AI系统曾公开报告失败）为例验证该范式。我们的解决方案结合GPT-5的探索能力与Gemini 3 Pro的证明优势，通过支持Python代码执行和基于文件记忆的智能体工作流，最终推导出正确答案（2112）并构建了严谨的数学证明。经过多次尝试的迭代优化，我们发现具身化基础与模型编排的必要性，同时人类提示从针对具体问题的提示演变为通用的、可迁移的元提示。我们分析了能力完备的AI为何会自主失败，各组件如何解决特定失效模式，并提炼出有效Vibe Reasoning的原则。研究结果表明，轻量级的人类指导能够释放前沿模型的数学推理潜力。本项研究仍在进行中，我们正在开发自动化框架并进行更广泛的评估，以进一步验证Vibe Reasoning的通用性与有效性。

相关内容

关注 7076

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

VIP会员

文章信息

前往arXiv

下载PDF