This paper explores the calibration of a classifier output score in binary classification problems. A calibrator is a function that maps the arbitrary classifier score, of a testing observation, onto $[0,1]$ to provide an estimate for the posterior probability of belonging to one of the two classes. Calibration is important for two reasons; first, it provides a meaningful score, that is the posterior probability; second, it puts the scores of different classifiers on the same scale for comparable interpretation. The paper presents three main contributions: (1) Introducing multi-score calibration, when more than one classifier provides a score for a single observation. (2) Introducing the idea that the classifier scores to a calibration process are nothing but features to a classifier, hence proposing expanding the classifier scores to higher dimensions to boost the calibrator's performance. (3) Conducting a massive simulation study, in the order of 24,000 experiments, that incorporates different configurations, in addition to experimenting on two real datasets from the cybersecurity domain. The results show that there is no overall winner among the different calibrators and different configurations. However, general advices for practitioners include the following: the Platt's calibrator~\citep{Platt1999ProbabilisticOutputsForSupport}, a version of the logistic regression that decreases bias for a small sample size, has a very stable and acceptable performance among all experiments; our suggested multi-score calibration provides better performance than single score calibration in the majority of experiments, including the two real datasets. In addition, expanding the scores can help in some experiments.
翻译:本文探讨了二进制分类器输出分数在二进制分类问题中的校准问题。 校准器是一个函数, 将任意分类分数、 测试观察分数映射到 $[0, 1$] 上, 以提供属于两个等级之一的后一级概率的估计数。 校准器之所以重要有两个原因; 首先, 它提供了一个有意义的分数, 即后一级概率; 第二, 它将不同分类分数的分数放在一个尺度上, 用于比较解释。 论文提出了三个主要贡献:(1) 引入多分校准, 当一个以上的分类分数为一次观察提供分数。 (2) 引入了将分类分数映射到校准进程只是一个特性的构想, 以提供归准器属于两个等级的估计数。 校准器总评分中, 包括不断递增的精确度, 包括不断变校准的多数值 。