Predicting sets of outcomes -- instead of unique outcomes -- is a promising solution to uncertainty quantification in statistical learning. Despite a rich literature on constructing prediction sets with statistical guarantees, adapting to unknown covariate shift -- a prevalent issue in practice -- poses a serious unsolved challenge. In this paper, we show that prediction sets with finite-sample coverage guarantee are uninformative and propose a novel flexible distribution-free method, PredSet-1Step, to efficiently construct prediction sets with an asymptotic coverage guarantee under unknown covariate shift. We formally show that our method is \textit{asymptotically probably approximately correct}, having well-calibrated coverage error with high confidence for large samples. We illustrate that it achieves nominal coverage in a number of experiments and a data set concerning HIV risk prediction in a South African cohort study. Our theory hinges on a new bound for the convergence rate of the coverage of Wald confidence intervals based on general asymptotically linear estimators.
翻译:预测结果 -- -- 而不是独特的结果 -- -- 是统计学习中不确定性量化的一个大有希望的解决办法。尽管在构建具有统计保障的预测数据集方面有丰富的文献,但适应未知的共变变化 -- -- 实践中一个普遍的问题 -- -- 是一个严重未解决的挑战。在本文中,我们表明,具有有限抽样覆盖保障的预测数据集缺乏信息,并提出一种新的灵活、无分配的灵活方法PredSet-1Step,以在未知的共变变化下高效构建具有无现时保障的预测数据集。我们正式表明,我们的方法可能是\ textit{asymptoty 可能大致正确},对大样本具有高度信心的精确校准覆盖错误。我们说明,它在许多实验中实现了名义覆盖,并在南非组群研究中建立了一套关于艾滋病毒风险预测的数据集。我们的理论取决于基于一般非现时线性估算器的瓦尔德信心间隔的合并率的新界限。