Identifying the cause of a proof failure during deductive verification of programs is hard: it may be due to an incorrectness in the program, an incompleteness in the program annotations, or an incompleteness of the prover. The changes needed to resolve a proof failure depend on its category, but the prover cannot provide any help on the categorisation. When using an SMT solver to discharge a proof obligation, that solver can propose a model from a failed attempt, from which a possible counterexample can be derived. But the counterexample may be invalid, in which case it may add more confusion than help. To check the validity of a counterexample and to categorise the proof failure, we propose the comparison between the run-time assertion-checking (RAC) executions under two different semantics, using the counterexample as an oracle. The first RAC execution follows the normal program semantics, and a violation of a program annotation indicates an incorrectness in the program. The second RAC execution follows a novel "giant-step" semantics that does not execute loops nor function calls but instead retrieves return values and values of modified variables from the oracle. A violation of the program annotations only observed under giant-step execution characterises an incompleteness of the program annotations. We implemented this approach in the Why3 platform for deductive program verification and evaluated it using examples from prior literature.
翻译:在对程序进行扣减核查时,很难辨别证明失败的原因:这可能是由于程序不正确,程序说明不完整,或证明人不完全。解决证明失败所需的修改取决于其类别,但证明人无法在分类上提供任何帮助。当使用 SMT 解答器履行证明义务时,解答人可以从失败尝试中提出一个模型,从中可以得出一个可能的反示例。但反示例可能是无效的,在这种情况下,它可能增加更多的混乱而不是帮助。为了检查反示例的有效性和确定证明失败,我们建议用两个不同的语义来比较运行时间校正检查(RAC)处决,使用反示例作为解析符。第一次RAC执行可以遵循正常的程序语义和违反程序说明,这表明程序有不正确之处。第二次RAC执行过程遵循的是新颖的“关键步骤”的语义,如果它不执行回溯逻辑或运行前流程的变数,我们只能从执行过程的变数中检索这个变数。