We propose an evaluation framework for class probability estimates (CPEs) in the presence of label uncertainty, which is commonly observed as diagnosis disagreement between experts in the medical domain. We also formalize evaluation metrics for higher-order statistics, including inter-rater disagreement, to assess predictions on label uncertainty. Moreover, we propose a novel post-hoc method called $alpha$-calibration, that equips neural network classifiers with calibrated distributions over CPEs. Using synthetic experiments and a large-scale medical imaging application, we show that our approach significantly enhances the reliability of uncertainty estimates: disagreement probabilities and posterior CPEs.
翻译:在标签不确定的情况下,我们建议一个等级概率估计评价框架,通常认为这是医疗领域专家之间的诊断分歧,我们还正式确定较高等级统计的评价指标,包括跨河者之间的分歧,以评估标签不确定性的预测。此外,我们提议一种叫做alpha$-校准的新式的热后方法,使神经网络分类系统能够对CPE进行校准分布。我们通过合成实验和大规模医疗成像应用,表明我们的方法大大增强了不确定性估计的可靠性:分歧概率和后天CPE。