Regression testing is an essential activity to assure that software code changes do not adversely affect existing functionalities. With the wide adoption of Continuous Integration (CI) in software projects, which increases the frequency of running software builds, running all tests can be time-consuming and resource-intensive. To alleviate that problem, Test case Selection and Prioritization (TSP) techniques have been proposed to improve regression testing by selecting and prioritizing test cases in order to provide early feedback to developers. In recent years, researchers have relied on Machine Learning (ML) techniques to achieve effective TSP (ML-based TSP). Such techniques help combine information about test cases, from partial and imperfect sources, into accurate prediction models. This work conducts a systematic literature review focused on ML-based TSP techniques, aiming to perform an in-depth analysis of the state of the art, thus gaining insights regarding future avenues of research. To that end, we analyze 29 primary studies published from 2006 to 2020, which have been identified through a systematic and documented process. This paper addresses five research questions addressing variations in ML-based TSP techniques and feature sets for training and testing ML models, alternative metrics used for evaluating the techniques, the performance of techniques, and the reproducibility of the published studies.
翻译:由于软件项目广泛采用连续整合(CI),增加了运行软件的频率,所有测试都可能耗时耗时和资源密集型。为了缓解这一问题,提出了测试案例选择和优先排序(TSP)技术,通过选择测试案例和优先排序来改进回归测试,以便向开发者提供早期反馈。近年来,研究人员依靠机械学习(ML)技术来实现有效的TSP(基于ML的TSP),这些技术有助于将测试案例的信息从部分和不完善来源纳入准确的预测模型。这项工作进行系统的文献审查,重点是基于ML的TSP技术,目的是深入分析艺术状况,从而了解未来的研究途径。为此,我们分析了2006年至2020年出版的29份初级研究,通过系统化和有文件记录的程序加以确定。本文涉及基于MLTSP技术和用于培训和测试ML模型的特征集、用于评估技术的替代指标、技术的绩效和可变性研究。