二元分类制度准确性中的临界值和界限 (Prevalence Threshold and bounds in the Accuracy of Binary Classification Systems)

The accuracy of binary classification systems is defined as the proportion of correct predictions - both positive and negative - made by a classification model or computational algorithm. A value between 0 (no accuracy) and 1 (perfect accuracy), the accuracy of a classification model is dependent on several factors, notably: the classification rule or algorithm used, the intrinsic characteristics of the tool used to do the classification, and the relative frequency of the elements being classified. Several accuracy metrics exist, each with its own advantages in different classification scenarios. In this manuscript, we show that relative to a perfect accuracy of 1, the positive prevalence threshold ($\phi_e$), a critical point of maximum curvature in the precision-prevalence curve, bounds the $F{_{\beta}}$ score between 1 and 1.8/1.5/1.2 for $\beta$ values of 0.5/1.0/2.0, respectively; the $F_1$ score between 1 and 1.5, and the Fowlkes-Mallows Index (FM) between 1 and $\sqrt{2} \approx 1.414$. We likewise describe a novel $negative$ prevalence threshold ($\phi_n$), the level of sharpest curvature for the negative predictive value-prevalence curve, such that $\phi_n$ $>$ $\phi_e$. The area between both these thresholds bounds the Matthews Correlation Coefficient (MCC) between $\sqrt{2}/2$ and $\sqrt{2}$. Conversely, the ratio of the maximum possible accuracy to that at any point below the prevalence threshold, $\phi_e$, goes to infinity with decreasing prevalence. Though applications are numerous, the ideas herein discussed may be used in computational complexity theory, artificial intelligence, and medical screening, amongst others. Where computational time is a limiting resource, attaining the prevalence threshold in binary classification systems may be sufficient to yield levels of accuracy comparable to that under maximum prevalence.

翻译：二进制分类系统的准确性被定义为由分类模型或计算算法作出的正确预测(正值和负值)的比例。值介于0(无准确性)和1(准确性)之间,分类模型的准确性取决于几个因素,特别是:使用的分类规则或算法,用于分类的工具的内在特征,以及元素的相对频率。存在若干精确度指标,每个指标在不同分类假设中都有其自身优势。在本手稿中,我们显示相对于精确度模型或计算算法的准确性比率1、正流行率阈值(美元_e$),精确度临界值为0(无准确性)和1(准确性)之间的一个临界点,将1美元和1.8/1\.5/1.5美元之间的分数,用于分类的值为1至1.5美元之间,以及Fowk-Mallowkey-Mallows指数(FM)介于1美元至2美元之间,正均值为1.414美元之间。我们同样描述在精确度临界值的基值值值值值值值值值中,最高值值值为正值值值值值值值值值值为正数-正值=_正值值值值值值值值值值值=正值值=正值=正值=正值=正值=正值=正值=正值=正值=正值=正值。