The multi-class prediction had gained popularity over recent years. Thus measuring fit goodness becomes a cardinal question that researchers often have to deal with. Several metrics are commonly used for this task. However, when one has to decide about the right measurement, he must consider that different use-cases impose different constraints that govern this decision. A leading constraint at least in \emph{real world} multi-class problems is imbalanced data: Multi categorical problems hardly provide symmetrical data. Hence, when we observe common KPIs (key performance indicators), e.g., Precision-Sensitivity or Accuracy, one can seldom interpret the obtained numbers into the model's actual needs. We suggest generalizing Matthew's correlation coefficient into multi-dimensions. This generalization is based on a geometrical interpretation of the generalized confusion matrix.
翻译:多级预测近年来越来越受欢迎。 因此,衡量 " 健康 " 标准成为研究人员经常不得不处理的一个主要问题。 有几个指标通常用于这项任务。 但是,当一个人必须决定正确的衡量方法时,他必须考虑不同的使用情况对决定施加不同的限制。 至少在 emph{ real world} 多级问题中,一个主要的限制因素是数据不平衡:多级绝对问题很难提供对称数据。 因此,当我们观察共同的KPI(关键业绩指标)时,例如,精确度或准确度,人们很少能将获得的数字解释为模型的实际需要。我们建议将马修的关联系数概括为多级。这种概括化的基础是对普遍混乱矩阵的几何解释。