In computational histopathology algorithms now outperform humans on a range of tasks, but to date none are employed for automated diagnoses in the clinic. Before algorithms can be involved in such high-stakes decisions they need to "know when they don't know", i.e., they need to estimate their predictive uncertainty. This allows them to defer potentially erroneous predictions to a human pathologist, thus increasing their safety. Here, we evaluate the predictive performance and calibration of several uncertainty estimation methods on clinical histopathology data. We show that a distance-aware uncertainty estimation method outperforms commonly used approaches, such as Monte Carlo dropout and deep ensembles. However, we observe a drop in predictive performance and calibration on novel samples across all uncertainty estimation methods tested. We also investigate the use of uncertainty thresholding to reject out-of-distribution samples for selective prediction. We demonstrate the limitations of this approach and suggest areas for future research.
翻译:在计算生理病理学的算法中,现在在一系列任务上比人类表现优于人类,但迄今为止没有在诊所的自动诊断中使用任何方法。在算法能够参与这种高发决定之前,他们需要“知道自己不知道”,也就是说,他们需要估计自己的预测不确定性。这使他们能够将潜在的错误预测推迟给人类病理学家,从而增加他们的安全性。在这里,我们评估临床病理学数据中若干不确定性估计方法的预测性能和校准。我们显示,远觉不确定性估计方法比通常使用的方法,如蒙特卡洛辍学和深层编组。然而,我们观察到在所有所测试的不确定性估计方法中,新样本的预测性能和校准性下降。我们还调查了使用不确定性阈值拒绝分配外样本进行选择性预测的使用情况。我们展示了这一方法的局限性,并提出未来研究的领域。