In this work, we aim to calibrate the score outputs of an estimator for the binary classification problem by finding an 'optimal' mapping to class probabilities, where the 'optimal' mapping is in the sense that minimizes the classification error (or equivalently, maximizes the accuracy). We show that for the given target variables and the score outputs of an estimator, an 'optimal' soft mapping, which monotonically maps the score values to probabilities, is a hard mapping that maps the score values to $0$ and $1$. We show that for class weighted (where the accuracy for one class is more important) and sample weighted (where the samples' accurate classifications are not equally important) errors, or even general linear losses; this hard mapping characteristic is preserved. We propose a sequential recursive merger approach, which produces an 'optimal' hard mapping (for the observed samples so far) sequentially with each incoming new sample. Our approach has a logarithmic in sample size time complexity, which is optimally efficient.
翻译:在此工作中, 我们的目标是校准二进制分类问题估计值的分数输出, 找到一个“ 最优” 映射到分类概率的“ 最优” 映射到“ 最优” 映射到“ 最优” 映射到“ 最优” 的分类错误, 也就是“ 最优” 映射到“ 最优” 的分数输出到“ 最优” 的等级概率, 也就是“ 最优” 映射到“ 最佳” 的分数, 也就是“ 最优 ” 。 我们用“ 最佳” 映射显示“ 最佳” 的分数输出到“ 最优 ” 的分数, 也就是“ 最佳” 软映射, 是一个硬绘图, 将分数值映射到 0. 0 美元 和 $ 美元 。 我们的图显示, 对于等级加权( 某一类的精度比较重要 ) 和抽样加权( 准确的分类不是同等重要 ) ) 错误, 甚至是一般线性损失, 这个硬性绘图特征保存 。 我们建议一个连续递归为 递合法,,, 相继产生一个“ 最优化的合并法,, 和每个新样品的精度为“ 最有效 。