A typical desideratum for quantifying the uncertainty from a classification model as a prediction set is class-conditional singleton set calibration. That is, such sets should map to the output of well-calibrated selective classifiers, matching the observed frequencies of similar instances. Recent works proposing adaptive and localized conformal p-values for deep networks do not guarantee this behavior, nor do they achieve it empirically. Instead, we use the strong signals for prediction reliability from KNN-based approximations of Transformer networks to construct data-driven partitions for Mondrian Conformal Predictors, which are treated as weak selective classifiers that are then calibrated via a new Inductive Venn Predictor, the Venn-ADMIT Predictor. The resulting selective classifiers are well-calibrated, in a conservative but practically useful sense for a given threshold. They are inherently robust to changes in the proportions of the data partitions, and straightforward conservative heuristics provide additional robustness to covariate shifts. We compare and contrast to the quantities produced by recent Conformal Predictors on several representative and challenging natural language processing classification tasks, including class-imbalanced and distribution-shifted settings.
翻译:用来量化作为预测组的分类模型的不确定性的典型偏差值是等级条件单吨定置校准。 也就是说, 这样的组群应该映射到精确校准的选择性分类器的输出, 与类似情况所观察到的频率相匹配。 最近为深层网络提出适应性和本地符合的 p值的工程并不能保证这种行为, 也没有从经验上实现。 相反, 我们使用基于 KNN 的变压器网络近似点的强烈信号来预测可靠性, 为Mondrian Condor 预测器构建数据驱动的分区。 这些预测器被视作薄弱的选择性分类器, 然后通过一个新的导导射Venn- ADMIT 预测器( Venn- ADMIT) 校准。 由此产生的选择性分类器具有很好的校准性, 保守但对于给定的临界值来说实际上有用。 它们对于数据分割器比例的变化具有内在的坚固性, 和直截的保守的超常态的超常态性使变。 我们比较和对比了最近Cond 预测器对若干有代表性且具有挑战性的语言处理功能的自然语言分类任务产生的数量。