Modern black-box predictive models are often accompanied by weak performance guarantees that only hold asymptotically in the size of the dataset or require strong parametric assumptions. In response to this, split conformal prediction represents a promising avenue to obtain finite-sample guarantees under minimal distribution-free assumptions. Although prediction set validity most often concerns marginal coverage, we explore the related but different guarantee of tolerance regions, reformulating known results in the language of nested prediction sets and extending on the duality between marginal coverage and tolerance regions. Furthermore, we highlight the connection between split conformal prediction and classical tolerance predictors developed in the 1940s, as well as recent developments in distribution-free risk control. One result that transfers from classical tolerance predictors is that the coverage of a prediction set based on order statistics, conditional on the calibration set, is a random variable stochastically dominating the Beta distribution. We demonstrate the empirical effectiveness of our findings on synthetic and real datasets using a popular split conformal prediction procedure called conformalized quantile regression (CQR).
翻译:现代黑盒预测模型往往伴随着不力的绩效保障,这些保证在数据集的大小上只是零星的,或需要强有力的参数假设。对此,分立的一致预测是获得最低分配假设下有限抽样保障的一条有希望的途径。虽然预测设定的有效性往往涉及边际覆盖,但我们探索宽容区域的相关但不同的保障,用嵌套预测数据集的语言重新排列已知结果,扩大边际覆盖和容忍区域之间的双重性。此外,我们强调1940年代开发的符合预测和传统容忍预测器之间的分化关联,以及最近无分配风险控制方面的发展。一个结果是,传统容忍预测器的转移是,以定序统计为基础、以校准组为条件的预测集的覆盖范围是随机的、可变的、支配的比塔分布。我们用流行的分解的定量回归(CQR)预测程序展示了我们对合成和真实数据集的调查结果的经验效果。