Deep neural network (DNN) classifiers are often overconfident, producing miscalibrated class probabilities. In high-risk applications like healthcare, practitioners require $\textit{fully calibrated}$ probability predictions for decision-making. That is, conditioned on the prediction $\textit{vector}$, $\textit{every}$ class' probability should be close to the predicted value. Most existing calibration methods either lack theoretical guarantees for producing calibrated outputs, reduce classification accuracy in the process, or only calibrate the predicted class. This paper proposes a new Kernel-based calibration method called KCal. Unlike existing calibration procedures, KCal does not operate directly on the logits or softmax outputs of the DNN. Instead, KCal learns a metric space on the penultimate-layer latent embedding and generates predictions using kernel density estimates on a calibration set. We first analyze KCal theoretically, showing that it enjoys a provable $\textit{full}$ calibration guarantee. Then, through extensive experiments across a variety of datasets, we show that KCal consistently outperforms baselines as measured by the calibration error and by proper scoring rules like the Brier Score.
翻译:深心神经网络( DNNN) 分类者往往过于自信, 产生错误校准的分类概率。 在医疗等高风险应用中, 执业者需要美元/ textit{ 完全校准} 美元概率预测来做决策。 也就是说, 以 $\ textit{ victor} 美元为条件, $\ textit{ every} 类概率应该接近预测值。 大部分现有的校准方法要么缺乏提供校准产出的理论保障, 降低校准的准确性, 要么只校准预测的等级 。 本文提出了一种新的基于克朗的校准方法, 名为 KCal 。 与现有的校准程序不同, KCal 并不直接在 DNN 的日志或软模输出上操作 。 相反, KCal 学习了点定层潜在嵌入的计量空间, 并使用校准集的内核密度估计值进行预测。 我们首先从理论上分析 KCal, 显示它享有一个可校准的 $\ full} 校准保证值。 然后, 通过测量各种的校正规则的大规模实验, 我们显示K 显示, 不断的校正 。