Formal verification techniques aim at formally proving the correctness of a computer program with respect to a formal specification, but the expertise and effort required for applying formal specification and verification techniques and scalability issues have limited their practical application. In recent years, the tremendous progress with SAT and SMT solvers enabled the construction of a new generation of tools that promise to make formal verification more accessible for software engineers, by automating most if not all of the verification process. The Dafny system is a prominent example of that trend. However, little evidence exists yet about its accessibility. To help fill this gap, we conducted a set of 10 case studies of developing verified implementations in Dafny of some real-world algorithms and data structures, to determine its accessibility for software engineers. We found that, on average, the amount of code written for specification and verification purposes is of the same order of magnitude as the traditional code written for implementation and testing purposes (ratio of 1.14) -- an ``overhead'' that certainly pays off for high-integrity software. The performance of the Dafny verifier was impressive, with 2.4 proof obligations generated per line of code written, and 24 ms spent per proof obligation generated and verified, on average. However, we also found that the manual work needed in writing auxiliary verification code may be significant and difficult to predict and master. Hence, further automation and systematization of verification tasks are possible directions for future advances in the field.
翻译:正式的核查技术旨在正式证明计算机程序在正式规格方面的正确性,但应用正式规格和核查技术和可扩缩性问题所需的专门知识和努力限制了其实际应用;近年来,在SAT和SMT软件方面的巨大进展使得能够建造新一代工具,保证使软件工程师更容易获得正式核查,使大多数甚至全部的核查程序自动化。Dafny系统是这一趋势的一个突出例子。然而,有关其可获取性的证据仍然很少。为填补这一空白,我们进行了一套10个案例研究,在达夫尼开发一些真实世界的算法和数据结构的经核实的执行,以确定软件工程师的可及性。我们发现,平均而言,为规格和核查目的编写的代码数量与为实施和测试目的编写的传统代码(1.14的缩略)一样,这是一个“超头”系统,肯定能为高品质软件的可获取性带来回报。为了填补这一空白,我们进行了一套实地核查工作,按代码中的每一行生成了2.4项证据义务,我们又发现,为完成一项重大核查工作,并且为每一行的难度的进度,因此,我们为每一件所需要的校订的校订的校准工作可能完成。