Neural Theorem Proving (NTP) employs LLMs to automate formal proofs in proof assistants. While LLMs have achieved relatively remarkable success in informal reasoning tasks using natural languages, the transition to mechanized formal theorem proving presents persistent challenges. Mechanized proof languages often contain many syntactic constructs and diverse, specialized proof tactics, which facilitate expert use but have no direct counterpart in informal mathematical proofs. These prover-specific idioms represent an additional burden for LLM-based NTPs that might be otherwise successful in generating informal proofs. Seeking to bridge this gap between formal proof construction and informal reasoning, in order to better facilitate NTP, this work approaches these challenges from a language design perspective. We look at common reasoning patterns in informal proofs and in existing mechanized proofs, and design Minilang -- a minimalist proof language that captures these reasoning patterns. In contrast to proof languages (informal and formal) that often feature a large collection of operations with unclear semantic boundaries, Minilang is deliberately kept minimalist -- its core design comprises only 10 operations, each with clear semantic distinctions. We further develop a rule-based translator from Isabelle's language (Isar) to Minilang, translating ~340K existing proofs with an ~85% success rate. Using this translated corpus, we finetune two LLMs to compare machine learning performance on Minilang versus the original Isar. Experiments show Minilang benefits the two LLMs by improving the pass@1 success rate on the PISA benchmark by up to 20/29 percentage points in comparison to the Isar-based LLMs w/wo Sledgehammer. The pass@1 rate reaches 69.1%, exceeding the prior work Baldur's pass@64 (65.7%); the pass@8 rate reaches 79.2%, exceeding the SOTA on PISA (71.0%) achieved by Magnushammer.


翻译:神经定理证明(NTP)利用大型语言模型(LLMs)在证明助手中自动化生成形式化证明。尽管LLMs在使用自然语言的非形式推理任务中已取得相对显著的成就,但转向机械化的形式定理证明仍存在持续挑战。机械化证明语言通常包含大量语法结构和多样化的专用证明策略(tactics),这些虽便于专家使用,但在非形式数学证明中并无直接对应物。这些证明器特有的惯用法对基于LLM的NTP构成了额外负担,而后者本可能在生成非形式证明方面取得成功。为了弥合形式化证明构建与非形式推理之间的鸿沟,从而更好地促进NTP,本研究从语言设计的角度应对这些挑战。我们考察了非形式证明和现有机械化证明中的常见推理模式,并设计了Minilang——一种捕获这些推理模式的极简证明语言。与通常包含大量语义边界模糊的操作的证明语言(无论非形式还是形式)不同,Minilang被刻意保持极简——其核心设计仅包含10种操作,每种操作都具有清晰的语义区分。我们进一步开发了一个从Isabelle语言(Isar)到Minilang的基于规则的翻译器,以约85%的成功率翻译了约34万条现有证明。利用此翻译后的语料库,我们微调了两个LLM,以比较它们在Minilang与原始Isar上的机器学习性能。实验表明,与基于Isar的LLM(无论是否使用Sledgehammer)相比,Minilang使两个LLM受益,在PISA基准测试上将pass@1成功率提升了高达20/29个百分点。pass@1率达到69.1%,超过了先前工作Baldur的pass@64(65.7%);pass@8率达到79.2%,超过了Magnushammer在PISA上实现的SOTA(71.0%)。

0
下载
关闭预览

相关内容

Top
微信扫码咨询专知VIP会员