Mutation testing is an established fault-based testing technique. It operates by seeding faults into the programs under test and asking developers to write tests that reveal these faults. These tests have the potential to reveal a large number of faults -- those that couple with the seeded ones -- and thus are deemed important. To this end, mutation testing should seed faults that are both "natural" in a sense easily understood by developers and strong (have high chances to reveal faults). To achieve this we propose using pre-trained generative language models (i.e. CodeBERT) that have the ability to produce developer-like code that operates similarly, but not exactly, as the target code. This means that the models have the ability to seed natural faults, thereby offering opportunities to perform mutation testing. We realise this idea by implementing $\mu$BERT, a mutation testing technique that performs mutation testing using CodeBert and empirically evaluated it using 689 faulty program versions. Our results show that the fault revelation ability of $\mu$BERT is higher than that of a state-of-the-art mutation testing (PiTest), yielding tests that have up to 17% higher fault detection potential than that of PiTest. Moreover, we observe that $\mu$BERT can complement PiTest, being able to detect 47 bugs missed by PiTest, while at the same time, PiTest can find 13 bugs missed by $\mu$BERT.
翻译:突变测试是一种既定的基于过失的测试技术。 它通过对测试中的程序进行播种错误测试, 并要求开发商写出显示这些错误的测试。 这些测试有可能揭示出大量错误, 即那些与种子的错误相伴的缺陷, 因而被认为是重要的。 为此, 突变测试应该产生“ 自然” 的缺陷, 而这些缺陷在开发商和强者都容易理解和强者( 极有可能发现错误) 。 为了实现这一点, 我们建议使用预先训练的基因化语言模型( codBERT), 这些模型有能力制作类似开发者( codBERT) 的代码, 这些代码运行类似, 但并不确切。 这意味着这些模型有能力播种自然错误, 从而提供进行突变测试的机会。 我们通过使用 codebetB 进行突变测试的突变测试技术来实现这个概念, 使用 代码和强力( ) 689 程序版本来进行突变测试。 我们的结果表明, $\muBERTER 的错误发现, 的披露能力比 13 的错误变贝贝贝贝贝测试的测试要高, 能够测试17号测试。