While achieving exact conditional coverage in conformal prediction is unattainable without making strong, untestable regularity assumptions, the promise of conformal prediction hinges on finding approximations to conditional guarantees that are realizable in practice. A promising direction for obtaining conditional dependence for conformal sets--in particular capturing heteroskedasticity--is through estimating the conditional density $\mathbb{P}_{Y|X}$ and conformalizing its level sets. Previous work in this vein has focused on nonconformity scores based on the empirical cumulative distribution function (CDF). Such scores are, however, computationally costly, typically requiring expensive sampling methods. To avoid the need for sampling, we observe that the CDF-based score reduces to a Mahalanobis distance in the case of Gaussian scores, yielding a closed-form expression that can be directly conformalized. Moreover, the use of a Gaussian-based score opens the door to a number of extensions of the basic conformal method; in particular, we show how to construct conformal sets with missing output values, refine conformal sets as partial information about $Y$ becomes available, and construct conformal sets on transformations of the output space. Finally, empirical results indicate that our approach produces conformal sets that more closely approximate conditional coverage in multivariate settings compared to alternative methods.
翻译:尽管在不做出强且不可检验的规则性假设的前提下,实现保形预测中精确的条件覆盖是不可行的,但保形预测的前景在于找到在实践中可实现的、对条件保证的近似方法。为获得保形集合的条件依赖性——特别是捕捉异方差性——一个有希望的方向是通过估计条件密度 $\mathbb{P}_{Y|X}$ 并对其水平集进行保形化。该方向的先前工作主要集中于基于经验累积分布函数(CDF)的非保形分数。然而,此类分数计算成本高昂,通常需要昂贵的采样方法。为避免采样需求,我们观察到在分数为高斯分布的情况下,基于CDF的分数可简化为马氏距离,从而得到一个可直接进行保形化的闭式表达式。此外,基于高斯分数的使用为基本保形方法的多种扩展打开了大门;具体而言,我们展示了如何构建具有缺失输出值的保形集合、在关于 $Y$ 的部分信息可用时细化保形集合,以及在输出空间的变换上构建保形集合。最后,实证结果表明,与替代方法相比,我们的方法在多元设置下产生的保形集合能更近似地实现条件覆盖。