Context: Performance metrics are a core component of the evaluation of any machine learning model and used to compare models and estimate their usefulness. Recent work started to question the validity of many performance metrics for this purpose in the context of software defect prediction. Objective: Within this study, we explore the relationship between performance metrics and the cost saving potential of defect prediction models. We study whether performance metrics are suitable proxies to evaluate the cost saving capabilities and derive a theory for the relationship between performance metrics and cost saving potential. Methods: We measure performance metrics and cost saving potential in defect prediction experiments. We use a multinomial logit model, decision, and random forest to model the relationship between the metrics and the cost savings. Results: We could not find a stable relationship between cost savings and performance metrics. We attribute the lack of the relationship to the inability of performance metrics to account for the property that a small proportion of very large software artifacts are the main driver of the costs. Conclusion: Any defect prediction study interested in finding the best prediction model, must consider cost savings directly, because no reasonable claims regarding the economic benefits of defect prediction can be made otherwise.
翻译:业绩计量是评价任何机器学习模型的核心组成部分,用于比较模型和估计其有用性。最近的工作开始在软件缺陷预测的范围内质疑许多用于此目的的业绩计量的有效性。目标:在本研究中,我们探讨了业绩计量与缺陷预测模型节省成本的潜力之间的关系。我们研究了性能指标是否是评价成本节约能力的适当替代物,是否为业绩衡量量与成本节约潜力之间的关系提供了理论。方法:我们衡量性能衡量尺度和缺陷预测实验中节省成本的潜力。我们使用多数字逻辑模型、决定和随机森林来模拟衡量尺度与成本节约之间的关系。结果:我们无法找到成本节约与绩效衡量尺度之间的稳定关系。我们把缺乏这种关系归因于性能衡量标准无法对成本主要驱动因素占很小比例的非常大型软件文物进行核算。结论:任何对找到最佳预测模型感兴趣的缺陷预测研究都必须直接考虑成本节约,因为无法以其他方式对缺陷预测的经济效益提出合理的索赔。