Mutation testing has been demonstrated to be one of the most powerful fault-revealing tools in the tester's tool kit. Much previous work implicitly assumed it to be sufficient to re-compute mutant suites per release. Sadly, this makes mutation results inconsistent; mutant scores from each release cannot be directly compared, making it harder to measure test improvement. Furthermore, regular code change means that a mutant suite's relevance will naturally degrade over time. We measure this degradation in relevance for 143,500 mutants in 4 non-trivial systems finding that, on overage, 52% degrade. We introduce a mutant brittleness measure and use it to audit software systems and their mutation suites. We also demonstrate how consistent-by-construction long-standing mutant suites can be identified with a 10x improvement in mutant relevance over an arbitrary test suite. Our results indicate that the research community should avoid the re-computation of mutant suites and focus, instead, on long-standing mutants, thereby improving the consistency and relevance of mutation testing.
翻译:变异测试已被证明是测试者工具箱中最强大的错误读取工具之一。 先前的许多工作暗含地认为它足以重新计算每次释放的变异套件。 可悲的是, 这使得突变结果不一致; 无法直接比较每次释放的变异分数, 因而更难测量改进。 此外, 常规代码修改意味着变异套件的相关性会随着时间的流逝而自然地退化。 我们测量了4个非三角系统中143,500个变异体变异体的这种退化相关性,发现在超长时, 52%的变异体降解。 我们引入变异易变异易变异性测量方法, 并用它来审计软件系统及其变异套件。 我们还展示了如何通过构建长期的变异型套件与任意测试套件的关联性10x改进。 我们的结果表明, 研究界应该避免变异变套件套件的重新雕刻, 并关注长期存在的变异体测试, 从而改进一致性和相关性。