In recent years, several classification methods that intend to quantify epistemic uncertainty have been proposed, either by producing predictions in the form of second-order distributions or sets of probability distributions. In this work, we focus on the latter, also called credal predictors, and address the question of how to evaluate them: What does it mean that a credal predictor represents epistemic uncertainty in a faithful manner? To answer this question, we refer to the notion of calibration of probabilistic predictors and extend it to credal predictors. Broadly speaking, we call a credal predictor calibrated if it returns sets that cover the true conditional probability distribution. To verify this property for the important case of ensemble-based credal predictors, we propose a novel nonparametric calibration test that generalizes an existing test for probabilistic predictors to the case of credal predictors. Making use of this test, we empirically show that credal predictors based on deep neural networks are often not well calibrated.
翻译:近些年来,人们提出了几种分类方法,意在量化缩略图不确定性,或者以第二顺序分布或概率分布各组的形式提出预测。在这项工作中,我们把重点放在后者上,又称为信服预测器,并解决如何评估它们的问题:缩略图预言器意味着什么以忠实的方式表示缩略图不确定性?为了回答这个问题,我们提到了概率预测器的校准概念,并将它扩大到临界预测器。广义地说,我们称之为折线预测器,如果它返回包含真实的有条件概率分布。为了验证基于共同点的折线预测器这一重要案例的属性,我们提议了一个新的非参数校准测试,将现有概率预测器的测试概括到曲线预测器的情况。我们利用这一测试的经验显示,基于深神经网络的直线性预测器往往没有很好校准。