We study the fundamental question of how to define and measure the distance from calibration for probabilistic predictors. While the notion of perfect calibration is well-understood, there is no consensus on how to quantify the distance from perfect calibration. Numerous calibration measures have been proposed in the literature, but it is unclear how they compare to each other, and many popular measures such as Expected Calibration Error (ECE) fail to satisfy basic properties like continuity. We present a rigorous framework for analyzing calibration measures, inspired by the literature on property testing. We propose a ground-truth notion of distance from calibration: the $\ell_1$ distance to the nearest perfectly calibrated predictor. We define a consistent calibration measure as one that is a polynomial factor approximation to the this distance. Applying our framework, we identify three calibration measures that are consistent and can be estimated efficiently: smooth calibration, interval calibration, and Laplace kernel calibration. The former two give quadratic approximations to the ground truth distance, which we show is information-theoretically optimal. Our work thus establishes fundamental lower and upper bounds on measuring distance to calibration, and also provides theoretical justification for preferring certain metrics (like Laplace kernel calibration) in practice.
翻译:我们研究如何定义和测量与概率预测器校准的距离这一根本问题。 虽然完全校准的概念非常清楚, 但对于如何量化与完美校准的距离还没有达成共识。 文献中提出了许多校准措施, 但尚不清楚它们如何相互比较, 许多流行措施, 如预期校准错误(ECE) 无法满足诸如连续性等基本特性。 我们根据财产测试文献, 提出了一个分析校准测量的严格框架 。 我们提出了一个校准距离的地面真理概念: $\ ell_ 1$ 到最近的完全校准预测器的距离。 我们把一致校准措施定义为一个与这一距离相近的多元因素。 我们运用我们的框架, 我们确定了三种一致和可以有效估计的校准措施: 平稳校准、 间距校准和 Laplace 内核校准。 前两个标准让地面距离的四面近近度, 我们所显示的是信息- 度最优化的距离 。 我们的工作将一个一致的校准度度度尺度定得更低, 和高端的校准方向, 也提供了测定方向。