Calibrated probabilistic classifiers are models whose predicted probabilities can directly be interpreted as uncertainty estimates. It has been shown recently that deep neural networks are poorly calibrated and tend to output overconfident predictions. As a remedy, we propose a low-bias, trainable calibration error estimator based on Dirichlet kernel density estimates, which asymptotically converges to the true $L_p$ calibration error. This novel estimator enables us to tackle the strongest notion of multiclass calibration, called canonical (or distribution) calibration, while other common calibration methods are tractable only for top-label and marginal calibration. The computational complexity of our estimator is $\mathcal{O}(n^2)$, the convergence rate is $\mathcal{O}(n^{-1/2})$, and it is unbiased up to $\mathcal{O}(n^{-2})$, achieved by a geometric series debiasing scheme. In practice, this means that the estimator can be applied to small subsets of data, enabling efficient estimation and mini-batch updates. The proposed method has a natural choice of kernel, and can be used to generate consistent estimates of other quantities based on conditional expectation, such as the sharpness of a probabilistic classifier. Empirical results validate the correctness of our estimator, and demonstrate its utility in canonical calibration error estimation and calibration error regularized risk minimization.
翻译:经校准的校准分类是模型,其预测的概率可以直接被解释为不确定性估计。最近已经显示,深神经网络的校准差强,并倾向于输出过于自信的预测。作为一种补救措施,我们提议了一个低偏差、可训练校准错误估计仪,其依据是Drichlet 内核密度估计,它必然会与真正的 $L_p$ 校准错误相融合。这个新颖的估测器使我们能够处理最强烈的多级校准概念,称为罐头(或分布)校准,而其他通用校准方法则只能用于顶级标签和边界校准。作为一种补救措施,我们估算的计算复杂性是 $\ mathcal{O}(n-1/2}}) 美元, 其趋同于真正的校准 $mathcal {O} (ncrial) 和以几何分解方案实现的精确校准值。在实践中,这意味着我们估算的精度的精度的精确度的精确度的精确度可以用来得出其他的精确性数据。