Uncertainty quantification is crucial for assessing the predictive ability of AI algorithms. A large body of work (including normalizing flows and Bayesian neural networks) has been devoted to describing the entire predictive distribution (PD) of a target variable Y given input features $\mathbf{X}$. However, off-the-shelf PDs are usually far from being conditionally calibrated; i.e., the probability of occurrence of an event given input $\mathbf{X}$ can be significantly different from the predicted probability. Most current research on predictive inference (such as conformal prediction) concerns constructing calibrated prediction sets only. It is often believed that the problem of obtaining and assessing entire conditionally calibrated PDs is too challenging. In this work, we show that recalibration, as well as diagnostics of entire PDs, are indeed attainable goals in practice. Our proposed method relies on the idea of regressing probability integral transform (PIT) scores against $\mathbf{X}$. This regression gives full diagnostics of conditional coverage across the entire feature space and can be used to recalibrate misspecified PDs. We benchmark our corrected prediction bands against oracle bands and state-of-the-art predictive inference algorithms for synthetic data, including settings with a distributional shift. Finally, we produce calibrated PDs for two applications: (i) probabilistic nowcasting based on sequences of satellite images, and (ii) estimation of galaxy distances based on imaging data (photometric redshifts).
翻译:不确定性量化对于评估AI 算法的预测能力至关重要。 大量的工作( 包括正常流流和巴耶斯神经网络) 已经用于描述目标变量 Y 的预测分布( PP) 输入功能 $\ mathbf{X} 美元 。 然而, 现成的 PD通常远非有条件校准; 也就是说, 某事件的发生概率可能与预测的概率大不相同 。 目前关于预测推论( 如符合预测) 的大多数研究( 如符合预测) 只涉及构建校准的图像。 人们常常认为获取和评估整个有条件校准的 PD 输入功能的完整预测分布( PD ) 问题太具有挑战性 。 在这项工作中, 我们显示, 重新校准, 以及对整个 PDDD 的诊断, 以及对整个 PDLA值的预测序列, 可以用来在基于我们 IMB 和 IMLA 的精确度 数据 上进行精确的精确度分析 。 我们建议的方法取决于 概率 相对于 $\ mathbx 的精确度 的精确度 的精确度,, 和 直径 的精确的精确 数据 的精确的精确的精确, 可以用于我们 的精确 的精确 和 的精确的精确的精确的精确 和精确的精确 数据 。