Baldur: 利用大语言模型全方位生成和修理</s> (Baldur: Whole-Proof Generation and Repair with Large Language Models)

Formally verifying software properties is a highly desirable but labor-intensive task. Recent work has developed methods to automate formal verification using proof assistants, such as Coq and Isabelle/HOL, e.g., by training a model to predict one proof step at a time, and using that model to search through the space of possible proofs. This paper introduces a new method to automate formal verification: We use large language models, trained on natural language text and code and fine-tuned on proofs, to generate whole proofs for theorems at once, rather than one step at a time. We combine this proof generation model with a fine-tuned repair model to repair generated proofs, further increasing proving power. As its main contributions, this paper demonstrates for the first time that: (1) Whole-proof generation using transformers is possible and is as effective as search-based techniques without requiring costly search. (2) Giving the learned model additional context, such as a prior failed proof attempt and the ensuing error message, results in proof repair and further improves automated proof generation. (3) We establish a new state of the art for fully automated proof synthesis. We reify our method in a prototype, Baldur, and evaluate it on a benchmark of 6,336 Isabelle/HOL theorems and their proofs. In addition to empirically showing the effectiveness of whole-proof generation, repair, and added context, we show that Baldur improves on the state-of-the-art tool, Thor, by automatically generating proofs for an additional 8.7% of the theorems. Together, Baldur and Thor can prove 65.7% of the theorems fully automatically. This paper paves the way for new research into using large language models for automating formal verification.

翻译：正式核查软件属性是一项非常可取但劳动密集型的任务。最近的工作已经开发出一些方法,用证据助理(如Coq和Isabelle/HOL)进行正式核查自动化,例如,培训一个模型来一次预测一个证据步骤,并使用该模型在可能的证明空间中搜索可能的证明。本文引入了一个新的方法,使正式核查自动化:我们使用大型语言模型,在自然语言文本和代码方面受过培训,并精细调整了校准,以便一次性地而不是一次地一次地为理论库生成整个证据。我们把这一证据生成模型与一个经过精细调整的修理模型结合起来,以修复生成的证据,进一步增强验证能力。作为其主要贡献,本文首次表明:(1) 使用变压器进行全体核查是可能的,而且与基于搜索的技术一样有效,而无需花费昂贵的搜索。(2) 向所学的模型附加了额外的背景,例如以前的证明尝试失败和随后的错误信息,导致证据的修复和进一步的自动化证据生成。(3) 我们为完全的自动证据合成而设计新状态,我们用它来进行新的证据合成。我们用原型模型、BALL36 展示了整个实验性检验和实验性检验方法,我们用整个模型来展示了整个的模型, 展示了整个模型, 展示了整个的模型,我们用来展示了整个工具的模型, 展示了整个实验性地展示了它。</s>

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日