Several performance measures can be used for evaluating classification results: accuracy, F-measure, and many others. Can we say that some of them are better than others, or, ideally, choose one measure that is best in all situations? To answer this question, we conduct a systematic analysis of classification performance measures: we formally define a list of desirable properties and theoretically analyze which measures satisfy which properties. We also prove an impossibility theorem: some desirable properties cannot be simultaneously satisfied. Finally, we propose a new family of measures satisfying all desirable properties except one. This family includes the Matthews Correlation Coefficient and a so-called Symmetric Balanced Accuracy that was not previously used in classification literature. We believe that our systematic approach gives an important tool to practitioners for adequately evaluating classification results.
翻译:可以用几种业绩计量来评价分类结果:准确性、F度量和许多其他。我们可以说其中一些衡量标准比其他衡量标准更好,或者最好选择一种衡量标准在所有情况下都是最佳的吗?为了回答这个问题,我们对分类业绩计量进行系统分析:我们正式确定一个适当性能清单,从理论上分析哪些措施满足哪些属性。我们也证明不可能有理论:有些可取性能不能同时满足。最后,我们建议建立一个满足所有可取性能的衡量标准的新体系,只有一种除外。这个体系包括Matthews Correlogulation系数和所谓的对称平衡性准确性,以前在分类文献中没有使用过。我们认为,我们的系统方法为从业者充分评价分类结果提供了重要工具。