Quantifying the data uncertainty in learning tasks is often done by learning a prediction interval or prediction set of the label given the input. Two commonly desired properties for learned prediction sets are \emph{valid coverage} and \emph{good efficiency} (such as low length or low cardinality). Conformal prediction is a powerful technique for learning prediction sets with valid coverage, yet by default its conformalization step only learns a single parameter, and does not optimize the efficiency over more expressive function classes. In this paper, we propose a generalization of conformal prediction to multiple learnable parameters, by considering the constrained empirical risk minimization (ERM) problem of finding the most efficient prediction set subject to valid empirical coverage. This meta-algorithm generalizes existing conformal prediction algorithms, and we show that it achieves approximate valid population coverage and near-optimal efficiency within class, whenever the function class in the conformalization step is low-capacity in a certain sense. Next, this ERM problem is challenging to optimize as it involves a non-differentiable coverage constraint. We develop a gradient-based algorithm for it by approximating the original constrained ERM using differentiable surrogate losses and Lagrangians. Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly over existing approaches in several applications such as prediction intervals with improved length, minimum-volume prediction sets for multi-output regression, and label prediction sets for image classification.
翻译:量化学习任务中的数据不确定性往往是通过学习一个预测间隔或预测标签的标签输入后的预测值来完成的。对于学习的预测组来说,两种共同期望的属性是: emph{valid 覆盖} 和\emph{好的效率 } (例如低长度或低基基值)。 共性预测是学习有效覆盖的预测组的有力技术,但默认情况下,其一致性步骤只学习一个参数,而不是优化比更直观的功能类别的效率。 在本文件中,我们建议将符合性的预测与多个可学习的参数普遍化为一致的参数,方法是考虑到在找到符合有效经验覆盖的最有效的预测组(ERM)方面的有限的经验风险最小化(ERM)问题。 这种元性algorthm 将现有的符合一致的预测算法普遍化,我们表明,只要符合一致性步骤的功能类别在某种意义上的能力较低时,它就会达到接近一个参数。 下一步,这种机构风险管理问题具有挑战性优化,因为它涉及非差别的覆盖范围限制。 我们为它制定了一个基于梯度的预测值的多级- 算法, 能够通过控制原有的系统化的预估测测结果, 来大大改进我们现有的预测效率。