Though black-box predictors are state-of-the-art for many complex tasks, they often fail to properly quantify predictive uncertainty and may provide inappropriate predictions for unfamiliar data. Instead, we can learn more reliable models by letting them either output a prediction set or abstain when the uncertainty is high. We propose training these selective prediction-set models using an uncertainty-aware loss minimization framework, which unifies ideas from decision theory and robust maximum likelihood. Moreover, since black-box methods are not guaranteed to output well-calibrated prediction sets, we show how to calculate point estimates and confidence intervals for the true coverage of any selective prediction-set model, as well as a uniform mixture of K set models obtained from K-fold sample-splitting. When applied to predicting in-hospital mortality and length-of-stay for ICU patients, our model outperforms existing approaches on both in-sample and out-of-sample age groups, and our recalibration method provides accurate inference for prediction set coverage.
翻译:尽管黑箱预测器是许多复杂任务的最新技术,但它们往往无法适当量化预测不确定性,并且可能为不熟悉的数据提供不适当的预测。 相反,我们可以通过让它们输出一个预测数据集来学习更可靠的模型,或者在不确定性高时不使用。我们提议使用不确定性-觉察损失最小化框架来培训这些选择性预测设定模型,该框架统一了决策理论和稳健最大可能性的理念。此外,由于黑箱方法不能保证输出经过良好校准的预测数据集,我们展示了如何计算任何选择性预测集模型真实覆盖的点估计和信任间隔,以及从 K 倍样本分割中获得的K 集模型的统一组合。当应用到预测ICU 病人的住院死亡率和停留时间长度时,我们的模型超越了目前对样本和绝缘组的现有方法,我们的校准方法为预测设定的覆盖提供了准确的推断。