Probabilistic models must be well calibrated to support reliable decision-making. While calibration in single-output regression is well studied, defining and achieving multivariate calibration in multi-output regression remains considerably more challenging. The existing literature on multivariate calibration primarily focuses on diagnostic tools based on pre-rank functions, which are projections that reduce multivariate prediction-observation pairs to univariate summaries to detect specific types of miscalibration. In this work, we go beyond diagnostics and introduce a general regularization framework to enforce multivariate calibration during training for arbitrary pre-rank functions. This framework encompasses existing approaches such as highest density region calibration and copula calibration. Our method enforces calibration by penalizing deviations of the projected probability integral transforms (PITs) from the uniform distribution, and can be added as a regularization term to the loss function of any probabilistic predictor. Specifically, we propose a regularization loss that jointly enforces both marginal and multivariate pre-rank calibration. We also introduce a new PCA-based pre-rank that captures calibration along directions of maximal variance in the predictive distribution, while also enabling dimensionality reduction. Across 18 real-world multi-output regression datasets, we show that unregularized models are consistently miscalibrated, and that our methods significantly improve calibration across all pre-rank functions without sacrificing predictive accuracy.
翻译:概率模型必须具备良好的校准特性,才能支持可靠的决策。虽然单输出回归中的校准已有深入研究,但在多输出回归中定义并实现多元校准仍然面临显著挑战。现有关于多元校准的文献主要集中于基于预排序函数的诊断工具——这些投影方法将多元预测-观测对简化为单变量摘要,以检测特定类型的校准偏差。本研究超越了诊断范畴,引入了一种通用的正则化框架,可在训练过程中针对任意预排序函数强制实施多元校准。该框架涵盖了现有方法,如最高密度区域校准和Copula校准。我们的方法通过惩罚投影概率积分变换(PIT)与均匀分布的偏差来强化校准,并可作为正则化项添加到任意概率预测器的损失函数中。具体而言,我们提出了一种正则化损失,可同时强化边际校准和多元预排序校准。我们还引入了一种新的基于PCA的预排序函数,该函数能捕捉预测分布中最大方差方向的校准特性,同时实现降维。在18个真实世界多输出回归数据集上的实验表明,未正则化模型普遍存在校准偏差,而我们的方法能在不牺牲预测准确性的前提下,显著提升所有预排序函数的校准效果。