Mutation testing is an effective approach to evaluate and strengthen software test suites, but its adoption is currently limited by the mutants' execution computational cost. Several strategies have been proposed to reduce this cost (a.k.a. mutation cost reduction strategies), however none of them has proven to be effective for all scenarios since they often need an ad-hoc manual selection and configuration depending on the software under test (SUT). In this paper, we propose a novel multi-objective evolutionary hyper-heuristic approach, dubbed Sentinel, to automate the generation of optimal cost reduction strategies for every new SUT. We evaluate Sentinel by carrying out a thorough empirical study involving 40 releases of 10 open-source real-world software systems and both baseline and state-of-the-art strategies as a benchmark. We execute a total of 4,800 experiments, and evaluate their results with both quality indicators and statistical significance tests, following the most recent best practice in the literature. The results show that strategies generated by Sentinel outperform the baseline strategies in 95% of the cases always with large effect sizes. They also obtain statistically significantly better results than state-of-the-art strategies in 88% of the cases, with large effect sizes for 95% of them. Also, our study reveals that the mutation strategies generated by Sentinel for a given software version can be used without any loss in quality for subsequently developed versions in 95% of the cases. These results show that Sentinel is able to automatically generate mutation strategies that reduce mutation testing cost without affecting its testing effectiveness (i.e. mutation score), thus taking off from the tester's shoulders the burden of manually selecting and configuring strategies for each SUT.
翻译:突变测试是评估和增强软件测试套件的有效方法,但采用这种测试目前受到变异体执行计算成本的限制。我们提出了几项战略来降低这一成本(a.k.a. 突变成本削减战略),但其中没有任何一项战略被证明对所有假设方案都有效,因为它们往往需要根据测试中的软件(SUT)进行临时的手工选择和配置。在本文件中,我们提议了一种新颖的多目标进化超重力超重力进制方法,称为Sentinel,以自动实现每个新SUT的最佳降低成本战略。我们通过进行一项彻底的经验性研究来评估Sentinel的超重力,涉及10个公开源真实世界软件系统以及基线和最新技术战略的40个发布过程。我们进行了总共4800个实验,并用质量指标和统计意义测试来评价其结果,根据文献的最新实践,这些结果显示,Sentinel产生的战略超过95%的基线战略, 其效果总是很大。我们还从95个角度评估了Sentinel的测试结果,因此, 测试了95个测试案例的Stenuteal-cal-cal-creal acreal ex acreal real real acreal siews ex acal acal acustration acal acutes acal acal acal acal acal acal acal acuts acal acuts acal acal acal acal acal acent latingd lating thesto latings lidddd lis acuts acuts ax liddddddddddddddds acal acal acal acal acal acates si a lid lidddd liddddddddd lical acal acal acal acal acal lid lid lid lid lid lid lid lid lid lid lad lical acal acal a ex a lical a lid