Post-hoc multi-class calibration is a common approach for providing high-quality confidence estimates of deep neural network predictions. Recent work has shown that widely used scaling methods underestimate their calibration error, while alternative Histogram Binning (HB) methods often fail to preserve classification accuracy. When classes have small prior probabilities, HB also faces the issue of severe sample-inefficiency after the conversion into K one-vs-rest class-wise calibration problems. The goal of this paper is to resolve the identified issues of HB in order to provide calibrated confidence estimates using only a small holdout calibration dataset for bin optimization while preserving multi-class ranking accuracy. From an information-theoretic perspective, we derive the I-Max concept for binning, which maximizes the mutual information between labels and quantized logits. This concept mitigates potential loss in ranking performance due to lossy quantization, and by disentangling the optimization of bin edges and representatives allows simultaneous improvement of ranking and calibration performance. To improve the sample efficiency and estimates from a small calibration set, we propose a shared class-wise (sCW) calibration strategy, sharing one calibrator among similar classes (e.g., with similar class priors) so that the training sets of their class-wise calibration problems can be merged to train the single calibrator. The combination of sCW and I-Max binning outperforms the state of the art calibration methods on various evaluation metrics across different benchmark datasets and models, using a small calibration set (e.g., 1k samples for ImageNet).
翻译:后热多级校准是提供深神经网络预测的高质量信任估计的常见方法。 最近的工作表明,广泛使用的缩放方法低估了校准错误,而替代的 Histgraph Binning (HB) 方法往往无法保存分类准确性。 当等级的先前概率小, HB 也会面临在转换为 K 1 -vs- rest 类校准问题后, 样本效率严重低下的问题。 本文的目标是解决HB 的确定问题, 以便仅使用一个小的缓冲校准数据集提供校准信任估计, 用于优化 bin优化, 并保存多级排序排序的精确性能。 从信息- 理论角度, 我们推出I- Max 键概念, 使标签和量化日志对日志的校准结果最大化 。 这一概念减轻了由于失标, 以及由于对 bin 边缘和 代表的优化, 从而可以同时改进排序和校准状态的性能。 为了提高从小校准精度组的校准数据集集, 我们建议使用一个相同的级校准前校准的校准系统校准方法, 。