Probability predictions from binary regressions or machine learning methods ought to be calibrated: If an event is predicted to occur with probability $x$, it should materialize with approximately that frequency, which means that the so-called calibration curve $p(\cdot)$ should equal the identity, $p(x) = x$ for all $x$ in the unit interval. We propose honest calibration assessment based on novel confidence bands for the calibration curve, which are valid only subject to the natural assumption of isotonicity. Besides testing the classical goodness-of-fit null hypothesis of perfect calibration, our bands facilitate inverted goodness-of-fit tests whose rejection allows for the sought-after conclusion of a sufficiently well specified model. We show that our bands have a finite sample coverage guarantee, are narrower than existing approaches, and adapt to the local smoothness of the calibration curve $p$ and the local variance of the binary observations. In an application to model predictions of an infant having a low birth weight, the bounds give informative insights on model calibration.
翻译:从二进制回归或机器学习方法得出的概率预测应当加以校准:如果预测某一事件会发生,概率为$x美元,那么应该以大致的频率实现,这意味着所谓的校准曲线$p(cdot)美元应等于身份,单位间隔内所有美元x美元为$p(x)=x美元。我们提议根据校准曲线的新颖信任带对校准曲线进行诚实的校准评估,该校准带只能以异质性自然假设为条件。除了测试典型的完美校准的纯假设外,我们的波段还便利了反向的 " 适得其美 " 测试,这些测试的拒绝允许在所寻求的足够明确模型完成后进行。我们表明,我们的波段有一定的样本覆盖范围保证,比现有方法要窄,并适应校准曲线的当地平滑度 $p$p$和二进式观测的局部差异。在模型预测婴儿出生体重低的模型时,其界限为模型校准提供了信息性洞察。