Classification problems are essential statistical tasks that form the foundation of decision-making across various fields, including patient prognosis and treatment strategies for critical conditions. Consequently, evaluating the performance of classification models is of significant importance, and numerous evaluation metrics have been proposed. Among these, the Matthews correlation coefficient (MCC), also known as the phi coefficient, is widely recognized as a reliable metric that provides balanced measurements even in the presence of class imbalance. However, with the increasing prevalence of multiclass classification problems involving three or more classes, macro-averaged and micro-averaged extensions of MCC have been employed, despite a lack of clear definitions or established references for these extensions. In the present study, we propose a formal framework for MCC tailored to multiclass classification problems using macro-averaged and micro-averaged approaches. Moreover, discussions on the use of these extended MCCs for multiclass problems often rely solely on point estimates, potentially overlooking the statistical significance and reliability of the results. To address this gap, we introduce several methods for constructing asymptotic confidence intervals for the proposed metrics. Furthermore, we extend these methods to include the construction of asymptotic confidence intervals for differences in the proposed metrics, specifically for paired study designs. The utility of our methods is evaluated through comprehensive simulations and real-world data analyses.
翻译:暂无翻译