实践中极端突变试验:工业案例研究 (Extreme mutation testing in practice: An industrial case study)

Mutation testing is used to evaluate the effectiveness of test suites. In recent years, a promising variation called extreme mutation testing emerged that is computationally less expensive. It identifies methods where their functionality can be entirely removed, and the test suite would not notice it, despite having coverage. These methods are called pseudo-tested. In this paper, we compare the execution and analysis times for traditional and extreme mutation testing and discuss what they mean in practice. We look at how extreme mutation testing impacts current software development practices and discuss open challenges that need to be addressed to foster industry adoption. For that, we conducted an industrial case study consisting of running traditional and extreme mutation testing in a large software project from the semiconductor industry that is covered by a test suite of more than 11,000 unit tests. In addition to that, we did a qualitative analysis of 25 pseudo-tested methods and interviewed two experienced developers to see how they write unit tests and gathered opinions on how useful the findings of extreme mutation testing are. Our results include execution times, scores, numbers of executed tests and mutators, reasons why methods are pseudo-tested, and an interview summary. We conclude that the shorter execution and analysis times are well noticeable in practice and show that extreme mutation testing supplements writing unit tests in conjunction with code coverage tools. We propose that pseudo-tested code should be highlighted in code coverage reports and that extreme mutation testing should be performed when writing unit tests rather than in a decoupled session. Future research should investigate how to perform extreme mutation testing while writing unit tests such that the results are available fast enough but still meaningful.

翻译：使用突变测试来评估测试套件的有效性。近几年来, 出现了一种令人充满希望的变化, 称为极端突变测试, 其计算成本较低。它确定了可以完全消除其功能的方法, 测试套件尽管覆盖范围超过11 000个单位测试, 也不会注意到这些方法。这些方法被称为伪测试。在本文中, 我们比较了传统和极端突变测试的执行和分析时间, 并讨论了它们在实践中的意义。我们审视了极端突变测试对当前软件开发做法的影响, 并讨论了需要解决的公开挑战, 以促进行业采用。为此, 我们进行了一项工业案例研究, 其中包括在半导体行业的一个大型软件项目中, 运行传统和极端突变测试, 其功能尽管覆盖范围超过11 000个单位测试套件。此外, 我们对25个模拟测试方法进行了定性分析, 并采访了两位有经验的开发者, 查看了单位测试结果如何对极端突变试验的有用性。我们的结果应该包括执行时间、分数、已执行的测试数量和变异测试、方法仍然被假检测的原因, 以及访谈摘要。我们的结论是, 执行的更短的执行和测试周期的测试测试范围应该显示, 快速测试测试过程的测试测试过程的测试过程应该显示, 快速测试, 测试过程应该显示, 快速测试, 测试过程的测试过程的测试测试测试过程的代码应该显示快速测试, 测试, 测试, 测试, 快速测试, 测试, 测试, 测试, 测试, 测试过程应该显示测试过程应该显示, 测试, 测试, 快速测试, 测试, 测试