千行代码之鉴：面向大规模程序验证的潜在运行时错误引导规约合成 (A Tale of 1001 LoC: Potential Runtime Error-Guided Specification Synthesis for Verifying Large-Scale Programs)

Fully automated verification of large-scale software and hardware systems is arguably the holy grail of formal methods. Large language models (LLMs) have recently demonstrated their potential for enhancing the degree of automation in formal verification by, e.g., generating formal specifications as essential to deductive verification, yet exhibit poor scalability due to long-context reasoning limitations and, more importantly, the difficulty of inferring complex, interprocedural specifications. This paper presents Preguss -- a modular, fine-grained framework for automating the generation and refinement of formal specifications. Preguss synergizes between static analysis and deductive verification by steering two components in a divide-and-conquer fashion: (i) potential runtime error-guided construction and prioritization of verification units, and (ii) LLM-aided synthesis of interprocedural specifications at the unit level. We show that Preguss substantially outperforms state-of-the-art LLM-based approaches and, in particular, it enables highly automated RTE-freeness verification for real-world programs with over a thousand LoC, with a reduction of 80.6%~88.9% human verification effort.

翻译：大规模软硬件系统的全自动验证无疑是形式化方法领域的圣杯。近期，大型语言模型（LLM）通过生成形式化规约（作为演绎验证的关键要素）展现了提升形式化验证自动化程度的潜力，但由于长上下文推理能力的局限，更重要的是推断复杂跨过程规约的困难，其可扩展性较差。本文提出Preguss——一个模块化、细粒度的框架，用于自动化生成和精化形式化规约。Preguss通过以分治方式引导两个组件，实现了静态分析与演绎验证的协同：（i）基于潜在运行时错误引导的验证单元构建与优先级排序，以及（ii）在单元级别上借助LLM辅助合成跨过程规约。我们证明，Preguss显著优于当前最先进的基于LLM的方法，特别是它能够对超过一千行代码的实际程序实现高度自动化的无运行时错误验证，同时将人工验证工作量减少80.6%~88.9%。