Most software companies have extensive test suites and re-run parts of them continuously to ensure recent changes have no adverse effects. Since test suites are costly to execute, industry needs methods for test case prioritisation (TCP). Recently, TCP methods use machine learning (ML) to exploit the information known about the system under test (SUT) and its test cases. However, the value added by ML-based TCP methods should be critically assessed with respect to the cost of collecting the information. This paper analyses two decades of TCP research, and presents a taxonomy of 91 information attributes that have been used. The attributes are classified with respect to their information sources and the characteristics of their extraction process. Based on this taxonomy, TCP methods validated with industrial data and those applying ML are analysed in terms of information availability, attribute combination and definition of data features suitable for ML. Relying on a high number of information attributes, assuming easy access to SUT code and simplified testing environments are identified as factors that might hamper industrial applicability of ML-based TCP. The TePIA taxonomy provides a reference framework to unify terminology and evaluate alternatives considering the cost-benefit of the information attributes.
翻译:大多数软件公司都拥有广泛的测试套件,并连续地重新运行其中一部分,以确保近期的变化不会产生任何不利影响。由于测试套件执行费用昂贵,因此行业需要选择案件优先排序的方法。最近,TCP方法使用机器学习(ML),利用关于测试系统及其测试案例的已知信息。然而,基于ML的TCP方法所增加的价值应当根据收集信息的成本进行严格评估。本文分析了20年的TCP研究,并提供了91个已经使用的信息属性分类。这些属性按其信息来源和提取过程的特点分类。根据这一分类,经工业数据验证的TCP方法以及应用ML的方法在信息可得性、属性组合和适合ML的数据特征定义方面进行了分析。根据大量的信息属性,假设很容易使用SUT代码和简化测试环境,确定为可能妨碍基于ML的TCP的工业适用性要素。TEPIA分类提供了一个参考框架,以统一术语并评估考虑信息成本效益的替代方法。