Test suite minimization (TSM) is typically used to improve the efficiency of software testing by removing redundant test cases, thus reducing testing time and resources, while maintaining the fault detection capability of the test suite. Though many TSM approaches exist, most of them rely on code coverage (white-box) or model-based features, which are not always available for test engineers. Recent TSM approaches that rely only on test code (black-box) have been proposed, such as ATM and FAST-R. Though ATM achieves a better trade-off between effectiveness and efficiency than FAST-R, it suffers from scalability issues for large software systems as its execution time increases rapidly with test suite size. To address scalability, we propose LTM, a scalable and black-box similarity-based TSM approach based on language models. To support similarity measurement, we investigated three different pre-trained language models: CodeBERT, GraphCodeBERT, and UniXcoder, to extract embeddings of test code (Java test methods), on which we computed two similarity measures: Cosine Similarity and Euclidean Distance. Our goal is to find similarity measures that are not only computationally more efficient but can also better guide a Genetic Algorithm (GA), which is used for minimizing test suites, thus reducing minimization time. Experimental results showed that the best configuration of LTM (using UniXcoder with Cosine similarity) outperformed the best two configurations of ATM by achieving significantly higher fault detection rates (0.84 versus 0.81, on average) and, more importantly, running much faster (26.73 minutes versus 72.75 minutes, on average) than ATM, in terms of both preparation time (up to two orders of magnitude faster) and minimization time (one order of magnitude faster).
翻译:测试套件最小化(TSM)通常用于通过删除冗余测试用例来提高软件测试的效率,从而减少测试时间和资源,同时保持测试套件的故障检测能力。尽管存在许多TSM方法,但大多数方法依赖于代码覆盖率(白盒)或基于模型的特征,这些特征并不总是可用于测试工程师。最近提出了一些仅依赖于测试代码(黑盒)的TSM方法,如ATM和FAST-R。虽然ATM在有效性和效率之间实现了更好的平衡,但在大型软件系统中,其执行时间随测试套件规模的增加而迅速增加,存在可伸缩性问题。为了解决可伸缩性问题,我们提出了LTM,一种基于语言模型的可伸缩和黑盒相似度测试套件最小化方法。为了支持相似度测量,我们研究了三个不同的预训练语言模型:CodeBERT、GraphCodeBERT和UniXcoder,以提取测试代码(Java测试方法)的嵌入式。在此之上,我们计算出了两种相似性度量:余弦相似性和欧几里得距离。我们的目标是找到更加计算效率更高,同时能够更好地指导遗传算法(GA)进行测试套件最小化的相似度测量。实验结果表明,LTM的最佳配置(使用UniXcoder和余弦相似性)优于ATM的最佳两个配置,平均故障检测率显著更高(0.84与0.81),更重要的是,整体运行速度比ATM快得多(平均26.73分钟与72.75分钟),在准备时间(快两个数量级)和最小化时间(快一个数量级)方面都有很大提升。