Calibrating neural networks is of utmost importance when employing them in safety-critical applications where the downstream decision making depends on the predicted probabilities. Measuring calibration error amounts to comparing two empirical distributions. In this work, we introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test in which the main idea is to compare the respective cumulative probability distributions. From this, by approximating the empirical cumulative distribution using a differentiable function via splines, we obtain a recalibration function, which maps the network outputs to actual (calibrated) class assignment probabilities. The spine-fitting is performed using a held-out calibration set and the obtained recalibration function is evaluated on an unseen test set. We tested our method against existing calibration approaches on various image classification datasets and our spline-based recalibration approach consistently outperforms existing methods on KS error as well as other commonly used calibration measures. Our Code is available at https://github.com/kartikgupta-at-anu/spline-calibration.
翻译:在下游决策取决于预测概率的安全关键应用中,测量校准神经网络至关重要。测量校准错误等于比较两种经验分布。在这项工作中,我们引入了由古典科洛莫洛夫-斯米尔诺夫(KS)所启发的免费校准标准,主要想法是比较各自的累积概率分布。从这一点上,我们通过使用通过样条的不同函数对经验累积分布进行比对,获得一个校准功能,将网络输出映射为实际(校准)类分配概率。脊椎装配是使用悬置校准装置进行的,获得的校准功能是在一个无形测试装置上进行评估。我们对照各种图像分类数据集的现有校准方法以及我们基于螺旋的校准方法测试了我们的方法,不断超越关于KS错误的现有方法以及其他常用校准措施。我们的代码可在 https://github.com/kartikgupta-alib-anu/splineation上查阅。