Over the last years, machine learning techniques have been applied to more and more application domains, including software engineering and, especially, software quality assurance. Important application domains have been, e.g., software defect prediction or test case selection and prioritization. The ability to predict which components in a large software system are most likely to contain the largest numbers of faults in the next release helps to better manage projects, including early estimation of possible release delays, and affordably guide corrective actions to improve the quality of the software. However, developing robust fault prediction models is a challenging task and many techniques have been proposed in the literature. Closely related to estimating defect-prone parts of a software system is the question of how to select and prioritize test cases, and indeed test case prioritization has been extensively researched as a means for reducing the time taken to discover regressions in software. In this survey, we discuss various approaches in both fault prediction and test case prioritization, also explaining how in recent studies deep learning algorithms for fault prediction help to bridge the gap between programs' semantics and fault prediction features. We also review recently proposed machine learning methods for test case prioritization (TCP), and their ability to reduce the cost of regression testing without negatively affecting fault detection capabilities.
翻译:过去几年来,机器学习技术已应用于越来越多的应用领域,包括软件工程,特别是软件质量保证; 重要应用领域已经存在,例如软件缺陷预测或测试案例选择和优先排序; 能够预测大型软件系统中哪些组件最有可能包含下一期版本中数量最多的故障,有助于更好地管理项目,包括尽早估计可能的释放延误,以及可负担地指导纠正行动,以提高软件的质量。然而,开发稳健的错误预测模型是一项艰巨的任务,文献中提出了许多技术。与估算软件系统中易出故障部分密切相关的是,如何选择和优先排序测试案例的问题,事实上,测试案例的优先排序已经广泛研究,以缩短发现软件回归所需的时间。在这次调查中,我们讨论了错误预测和测试案例优先排序方面的各种办法,并解释了最近研究的错误预测深层学习算法如何帮助弥合程序语义和错误预测特征之间的差距。我们还审查了最近提出的测试案例优先排序的机器学习方法(TCP),以及这些方法在不造成反向性测试的情况下降低回归测试成本的能力。