Prediction, where observed data is used to quantify uncertainty about a future observation, is a fundamental problem in statistics. Prediction sets with coverage probability guarantees are a common solution, but these do not provide probabilistic uncertainty quantification in the sense of assigning beliefs to relevant assertions about the future observable. Alternatively, we recommend the use of a {\em probabilistic predictor}, a data-dependent (imprecise) probability distribution for the to-be-predicted observation given the observed data. It is essential that the probabilistic predictor be reliable or valid, and here we offer a notion of validity and explore its behavioral and statistical implications. In particular, we show that valid probabilistic predictors must be imprecise, that they avoid sure loss, and that they lead to prediction procedures with desirable frequentist error rate control properties. We provide a general construction of a provably valid probabilistic predictor, which has close connections to the powerful conformal prediction machinery, and we illustrate this construction in regression and classification applications.
翻译:观测数据用于量化未来观测的不确定性,这是统计中的一个根本问题。预测中包含概率保障的预测组是一个共同的解决办法,但这些预测组并不提供将信念归属于有关未来可观测数据的相关主张的概率不确定性量化。或者,我们建议使用一个基于数据的预测(不精确)概率分布,即根据观察到的数据对未来观测进行数据依赖(预测)的数据的概率分布。至关重要的是,概率预测器可靠或有效,我们在此提出一个有效性概念,并探讨其行为和统计影响。特别是,我们表明有效的概率预测器必须不准确,它们避免肯定的损失,并导致预测程序,而预测程序具有可取的频繁误率控制特性。我们一般地构建一个与强大一致预测机制密切相关的可靠(不精确)概率预测器。我们用回归和分类应用程序来说明这一构建。