In predictive modeling for high-stake decision-making, predictors must be not only accurate but also reliable. Conformal prediction (CP) is a promising approach for obtaining the confidence of prediction results with fewer theoretical assumptions. To obtain the confidence set by so-called full-CP, we need to refit the predictor for all possible values of prediction results, which is only possible for simple predictors. For complex predictors such as random forests (RFs) or neural networks (NNs), split-CP is often employed where the data is split into two parts: one part for fitting and another to compute the confidence set. Unfortunately, because of the reduced sample size, split-CP is inferior to full-CP both in fitting as well as confidence set computation. In this paper, we develop a full-CP of sparse high-order interaction model (SHIM), which is sufficiently flexible as it can take into account high-order interactions among variables. We resolve the computational challenge for full-CP of SHIM by introducing a novel approach called homotopy mining. Through numerical experiments, we demonstrate that SHIM is as accurate as complex predictors such as RF and NN and enjoys the superior statistical power of full-CP.
翻译:在预测高临界值决策的预测模型中,预测者必须不仅准确,而且可靠。非正式预测(CP)是获得预测结果信心的有希望的方法,其理论假设较少。为了获得所谓的完全CP所设定的信任,我们需要对预测结果的所有可能值重新进行预测,而只有简单的预测者才可能这样做。对于随机森林或神经网络等复杂预测者,往往在数据分为两个部分的地方使用分裂CP:一个部分用于安装,另一个部分用于计算信心集。不幸的是,由于抽样规模缩小,分裂CP在适应和信心计算两方面都低于完全CP。在本文件中,我们开发了稀疏高序互动模型(SHIM)的完整CP,该模型足够灵活,可以考虑到各变量之间的高分级相互作用。我们通过采用称为同质采矿的新办法解决SHIM全面CP的计算挑战。我们通过数字实验表明,SHCPM作为复杂的预测器是准确的,例如RF和NNN,并享有全面统计的高级数据。