In this paper, we present a novel approach for conformal prediction (CP), in which we aim to identify a set of promising prediction candidates -- in place of a single prediction. This set is guaranteed to contain a correct answer with high probability, and is well-suited for many open-ended classification tasks. In the standard CP paradigm, the predicted set can often be unusably large and also costly to obtain. This is particularly pervasive in settings where the correct answer is not unique, and the number of total possible answers is high. We first expand the CP correctness criterion to allow for additional, inferred "admissible" answers, which can substantially reduce the size of the predicted set while still providing valid performance guarantees. Second, we amortize costs by conformalizing prediction cascades, in which we aggressively prune implausible labels early on by using progressively stronger classifiers -- again, while still providing valid performance guarantees. We demonstrate the empirical effectiveness of our approach for multiple applications in natural language processing and computational chemistry for drug discovery.
翻译:在本文中,我们提出了一个符合预测(CP)的新颖方法,我们的目标是确定一组有希望的预测对象 -- -- 以取代单一预测。这套方法保证包含一个概率很高的正确答案,并适合于许多开放式分类任务。在标准的CP范式中,预测的成套方法往往不易大,而且成本很高。这在正确的答案并不独特,答案总数也很高的环境下特别普遍。我们首先扩大了CP的正确性标准,以允许更多的、推断的“可接受”答案,这可以大大缩小预测的一套答案的大小,同时仍然提供有效的绩效保证。第二,我们通过符合预测级联来摊开成本,在使用逐渐增强的分类器的早期,我们大力淡化不可信的标签 -- -- 同时仍然提供有效的性能保证。我们展示了我们在天然语言处理和药物发现计算化学中多种应用的方法的经验效果。