Selective classification allows models to abstain from making predictions (e.g., say "I don't know") when in doubt in order to obtain better effective accuracy. While typical selective models can be effective at producing more accurate predictions on average, they may still allow for wrong predictions that have high confidence, or skip correct predictions that have low confidence. Providing calibrated uncertainty estimates alongside predictions -- probabilities that correspond to true frequencies -- can be as important as having predictions that are simply accurate on average. However, uncertainty estimates can be unreliable for certain inputs. In this paper, we develop a new approach to selective classification in which we propose a method for rejecting examples with "uncertain" uncertainties. By doing so, we aim to make predictions with {well-calibrated} uncertainty estimates over the distribution of accepted examples, a property we call selective calibration. We present a framework for learning selectively calibrated models, where a separate selector network is trained to improve the selective calibration error of a given base model. In particular, our work focuses on achieving robust calibration, where the model is intentionally designed to be tested on out-of-domain data. We achieve this through a training strategy inspired by distributionally robust optimization, in which we apply simulated input perturbations to the known, in-domain training data. We demonstrate the empirical effectiveness of our approach on multiple image classification and lung cancer risk assessment tasks.
翻译:选择性分类允许模型在存疑时避免作出预测( 比如说“ 我不知道” ), 以便获得更有效的准确性。 典型的选择性模型可以有效地平均地产生更准确的预测, 也可以允许有高度信心的错误预测, 或者跳过信心低的正确预测。 提供校准的不确定性估计, 与预测 -- -- 符合真实频率的概率 -- -- 一样重要, 也可以有平均精确的预测一样重要。 但是, 某些投入的不确定性估计可能不可靠。 在本文中, 我们制定了一种选择性分类的新方法, 提出一种以“ 无法确定” 不确定性拒绝实例的方法。 通过这样做, 我们的目标是对所接受实例的分布作出有[ 高度校准 的不确定性估计。 我们称之为选择性校准校准模型的框架, 我们在那里训练单独的选择网络, 来改进某个基准模型的选择性校准错误。 特别是, 我们的工作重点是实现稳健的校准校准, 我们有意设计模型, 来测试“不确定”的“不确定”示例, 我们通过模拟的校准模型, 测试我们所了解的校准的校准的模型。