从跨项目脱轨预测转向异基因脱轨预测:部分重复研究 (Moving from Cross-Project Defect Prediction to Heterogeneous Defect Prediction: A Partial Replication Study)

Software defect prediction heavily relies on the metrics collected from software projects. Earlier studies often used machine learning techniques to build, validate, and improve bug prediction models using either a set of metrics collected within a project or across different projects. However, techniques applied and conclusions derived by those models are restricted by how identical those metrics are. Knowledge coming from those models will not be extensible to a target project if no sufficient overlapping metrics have been collected in the source projects. To explore the feasibility of transferring knowledge across projects without common labeled metrics, we systematically integrated Heterogeneous Defect Prediction (HDP) by replicating and validating the obtained results. Our main goal is to extend prior research and explore the feasibility of HDP and finally to compare its performance with that of its predecessor, Cross-Project Defect Prediction. We construct an HDP model on different publicly available datasets. Moreover, we propose a new ensemble voting approach in the HDP context to utilize the predictive power of multiple available datasets. The result of our experiment is comparable to that of the original study. However, we also explored the feasibility of HDP in real cases. Our results shed light on the infeasibility of many cases for the HDP algorithm due to its sensitivity to the parameter selection. In general, our analysis gives a deep insight into why and how to perform transfer learning from one domain to another, and in particular, provides a set of guidelines to help researchers and practitioners to disseminate knowledge to the defect prediction domain.

翻译：早期研究经常使用机器学习技术,利用在项目中或不同项目中收集的一套衡量标准来建立、验证和改进错误预测模型,然而,这些模型所应用的技术和结论受到这些衡量标准如何相同的限制。如果在源项目中没有收集到足够的重叠衡量标准,则这些模型产生的知识将无法延及目标项目。为了探索在没有共同标签指标的情况下在项目之间转让知识的可行性,我们通过复制和验证获得的结果,系统整合了超异性缺陷预测(HDP),我们的主要目标是扩大以前的研究范围,探索HDP的可行性,并最终将其业绩与其前身Cross-Project Suralion的绩效进行比较。我们用不同的公开数据集构建了HDP模型。此外,我们提议在HDP背景下采用新的多级投票方法,利用多种现有数据集的预测能力。我们的实验结果可以与最初研究相比。然而,我们还探讨了HDP的可行性,探索了HDP的可行性,并最后将它与前身、跨项目、跨项目、跨项目预测法的准确性分析结果。我们从一个深度分析到一个深度分析案例,我们从HDP到另一个深度分析结果,从一个深度分析到另一个深度分析。