Context: Software engineering has a problem in that when we empirically evaluate competing prediction systems we obtain conflicting results. Objective: To reduce the inconsistency amongst validation study results and provide a more formal foundation to interpret results with a particular focus on continuous prediction systems. Method: A new framework is proposed for evaluating competing prediction systems based upon (1) an unbiased statistic, Standardised Accuracy, (2) testing the result likelihood relative to the baseline technique of random 'predictions', that is guessing, and (3) calculation of effect sizes. Results: Previously published empirical evaluations of prediction systems are re-examined and the original conclusions shown to be unsafe. Additionally, even the strongest results are shown to have no more than a medium effect size relative to random guessing. Conclusions: Biased accuracy statistics such as MMRE are deprecated. By contrast this new empirical validation framework leads to meaningful results. Such steps will assist in performing future meta-analyses and in providing more robust and usable recommendations to practitioners.
翻译:目标:减少验证研究结果之间的不一致,为解释结果提供一个更正式的基础,特别侧重于连续预测系统。 方法:提出了一个新的框架,以评价竞争性预测系统,其依据是:(1) 无偏倚的统计,标准化的准确性,(2) 测试与随机`预测'基准技术(即猜测)相比的结果可能性,(3) 计算影响大小。结果:重新审查了以前公布的预测系统经验评价,并显示最初的结论不安全。此外,甚至最有力的结果也显示,与随机猜测相比,其影响大小不超过中等。结论:MMRE等误差准确性统计数据被淡化,相比之下,这一新的经验验证框架可以产生有意义的结果。这些步骤将有助于进行未来的元分析,并向从业人员提供更有力和有用的建议。