When working in a high-risk setting, having well calibrated probabilistic predictive models is a crucial requirement. However, estimators for calibration error are not always able to correctly distinguish which model is better calibrated. We propose the \emph{conditional kernel calibration error} (CKCE) which is based on the Hilbert-Schmidt norm of the difference between conditional mean operators. By working directly with the definition of strong calibration as the distance between conditional distributions, which we represent by their embeddings in reproducing kernel Hilbert spaces, the CKCE is less sensitive to the marginal distribution of predictive models. This makes it more effective for relative comparisons than previously proposed calibration metrics. Our experiments, using both synthetic and real data, show that CKCE provides a more consistent ranking of models by their calibration error and is more robust against distribution shift.
翻译:在高风险应用场景中,具备良好校准能力的概率预测模型是至关重要的前提。然而,现有的校准误差估计量并不总能准确区分不同模型的校准优劣。本文提出基于条件均值算子希尔伯特-施密特范数差异的\\emph{条件核校准误差}(CKCE)度量方法。该方法直接依据强校准定义——即条件分布之间的距离进行构建,通过将条件分布嵌入再生核希尔伯特空间进行表征,使得CKCE对预测模型的边缘分布敏感性较低。相较于既有校准指标,该方法在相对比较中表现出更高有效性。我们通过合成数据与真实数据的实验表明,CKCE能够为模型校准误差提供更一致的排序结果,并对分布偏移具有更强的鲁棒性。