Machine learning driven medical image segmentation has become standard in medical image analysis. However, deep learning models are prone to overconfident predictions. This has led to a renewed focus on calibrated predictions in the medical imaging and broader machine learning communities. Calibrated predictions are estimates of the probability of a label that correspond to the true expected value of the label conditioned on the confidence. Such calibrated predictions have utility in a range of medical imaging applications, including surgical planning under uncertainty and active learning systems. At the same time it is often an accurate volume measurement that is of real importance for many medical applications. This work investigates the relationship between model calibration and volume estimation. We demonstrate both mathematically and empirically that if the predictor is calibrated per image, we can obtain the correct volume by taking an expectation of the probability scores per pixel/voxel of the image. Furthermore, we show that convex combinations of calibrated classifiers preserve volume estimation, but do not preserve calibration. Therefore, we conclude that having a calibrated predictor is a sufficient, but not necessary condition for obtaining an unbiased estimate of the volume. We validate our theoretical findings empirically on a collection of 18 different (calibrated) training strategies on the tasks of glioma volume estimation on BraTS 2018, and ischemic stroke lesion volume estimation on ISLES 2018 datasets.
翻译:在医学图像分析中,由医疗机学驱动的医学图像分割已成为标准。然而,深层次的学习模型容易产生过于自信的预测。这导致在医学成像和更广泛的机器学习社区中重新注重校准预测。校准预测是对标签的概率的估计,该标签符合以信任为条件的标签的真实预期值。这种校准预测在一系列医学成像应用中有用,包括在不确定性和积极学习系统下进行手术规划。与此同时,它往往是对许多医学应用具有真正重要性的准确量度测量。这项工作调查了模型校准和量估之间的关系。我们在数学上和经验上都表明,如果对预测器进行校准,我们就能通过对每张图像的像素/voxel的概率值进行预期来获得正确的体积。此外,我们表明,校准的分类分类分类组合保存了体积估计,但并不保存校准。因此,我们的结论是,有一个校准的预测器是足够,但没有必要的条件来获得对体积的模型和量估算。我们从数学角度证明,我们在18年的SBARSLI的估算中,我们验证了我们关于18年的理论性估算。