通过有条件覆盖诊断分析校准的预测分布 (Calibrated Predictive Distributions via Diagnostics for Conditional Coverage)

Uncertainty quantification is crucial for assessing the predictive ability of AI algorithms. A large body of work (including normalizing flows and Bayesian neural networks) has been devoted to describing the entire predictive distribution (PD) of a target variable Y given input features $\mathbf{X}$. However, off-the-shelf PDs are usually far from being conditionally calibrated; i.e., the probability of occurrence of an event given input $\mathbf{X}$ can be significantly different from the predicted probability. Most current research on predictive inference (such as conformal prediction) concerns constructing prediction sets, that do not only provide correct uncertainties on average over the entire population (that is, averaging over $\mathbf{X}$), but that are also approximately conditionally calibrated with accurate uncertainties for individual instances. It is often believed that the problem of obtaining and assessing entire conditionally calibrated PDs is too challenging to approach. In this work, we show that recalibration as well as validation are indeed attainable goals in practice. Our proposed method relies on the idea of regressing probability integral transform (PIT) scores against $\mathbf{X}$. This regression gives full diagnostics of conditional coverage across the entire feature space and can be used to recalibrate misspecified PDs. We benchmark our corrected prediction bands against oracle bands and state-of-the-art predictive inference algorithms for synthetic data, including settings with distributional shift and dependent high-dimensional sequence data. Finally, we demonstrate an application to the physical sciences in which we assess and produce calibrated PDs for measurements of galaxy distances using imaging data (i.e., photometric redshifts).

翻译：不确定性量化对于评估AI 算法的预测能力至关重要。大量的工作( 包括正常流流和巴伊西亚神经网络) 已经用于描述目标变量 Y 输入特征 $\ mathbf{X} 美元的全部预测分布( PPD ) 。但是, 现成的 PD通常远非有条件校准; 也就是说, 发生一个输入 $\ mathbf{X} 的可能性可能与预测的概率大不相同。目前关于预测推论( 如符合预测) 的大多数研究都涉及构建预测数据集, 这不仅对整个人口( 平均 $\ mathbf{X} 输入输入输入输入输入输入输入输入的输入值( mathbf{X} 美元) 的全部预测分布( PA) 。我们提议的计算方法显示, 相对于精确度的精确度的精确度, 包括精确的精确度, 精确度的精确度和精确度的精确度的精确度。我们提出的方法显示, 相对于精确度的精确度的精确度, 相对于精确度的精确度的精确度, 我们使用的精确度的精确度和精确度的精确度, 的精确度的精确度的精确度, 显示, 和精确度的精确度。