Automated specification-based testing has a long history with several notable tools having emerged. For example, QuickCheck for Haskell focuses on testing against user-provided properties. Others, such as JMLUnit, use specifications in the form of pre- and post-conditions to drive testing. An interesting (and under-explored) question is how effective this approach is at finding bugs in practice. In general, one would assume automated testing is less effective at bug finding than static verification. But, how much less effective? To shed light on this question, we consider automated testing of programs written in Whiley -- a language with first-class support for specifications. Whilst originally designed with static verification in mind, we have anecdotally found automated testing for Whiley surprisingly useful and cost-effective. For example, when an error is detected with automated testing, a counterexample is always provided. This has motivated the more rigorous empirical examination presented in this paper. To that end, we provide a technical discussion of the implementation behind an automated testing tool for Whiley. Here, a key usability concern is the ability to parameterise the input space, and we present novel approaches for references and lambdas. We then report on several large experiments investigating the tool's effectiveness at bug finding using a range of benchmarks, including a suite of 1800+ mutants. The results indicate the automated testing is effective in many cases, and that sampling offers useful performance benefits with only modest reductions in bug-finding capability. Finally, we report on some real-world uses of the tool where it has proved effective at finding bugs (such as in the standard library).
翻译:基于自动化规格的测试有很长的历史,出现了一些值得注意的工具。例如,哈斯凯尔快速检查(QuickCheck for Haskell)侧重于对用户提供的特性进行测试。其他的测试,如JML Unite(JML Unite),使用预设和后设条件的规格来进行驱动测试。一个有趣的(和探索不足的)问题是,这一方法在实践中发现错误的效果如何。一般来说,假设自动测试比静态核查在发现错误方面的效果要低得多。但是,为了澄清这一问题,我们考虑自动测试在Wayy(一种对规格有一等支持的语言)中写成的程序。虽然最初设计为静态核查,但我们偶然地发现对Wayy United the controduct 进行自动测试。当通过自动测试检测发现错误时,总能提供反比这更严格的经验性检验。为此,我们提供了一个关于实施自动测试工具背后的实用性讨论。我们当时关注的一个关键用途是测量输入空间的大小的能力,而我们在18个抽样测试中,我们用新的测试模型中找到了一个测试方法。