Objective: Convolutional neural networks (CNNs) have demonstrated promise in automated cardiac magnetic resonance image segmentation. However, when using CNNs in a large real-world dataset, it is important to quantify segmentation uncertainty and identify segmentations which could be problematic. In this work, we performed a systematic study of Bayesian and non-Bayesian methods for estimating uncertainty in segmentation neural networks. Methods: We evaluated Bayes by Backprop, Monte Carlo Dropout, Deep Ensembles, and Stochastic Segmentation Networks in terms of segmentation accuracy, probability calibration, uncertainty on out-of-distribution images, and segmentation quality control. Results: We observed that Deep Ensembles outperformed the other methods except for images with heavy noise and blurring distortions. We showed that Bayes by Backprop is more robust to noise distortions while Stochastic Segmentation Networks are more resistant to blurring distortions. For segmentation quality control, we showed that segmentation uncertainty is correlated with segmentation accuracy for all the methods. With the incorporation of uncertainty estimates, we were able to reduce the percentage of poor segmentation to 5% by flagging 31--48% of the most uncertain segmentations for manual review, substantially lower than random review without using neural network uncertainty (reviewing 75--78% of all images). Conclusion: This work provides a comprehensive evaluation of uncertainty estimation methods and showed that Deep Ensembles outperformed other methods in most cases. Significance: Neural network uncertainty measures can help identify potentially inaccurate segmentations and alert users for manual review.
翻译:目标: 进化神经网络(CNNs) 在自动心脏磁共振图像分割中表现出了希望。 但是, 当在大型真实世界的数据集中使用CNN时, 有必要量化分解不确定性, 并找出可能有问题的分解。 在这项工作中, 我们对巴伊西亚和非巴伊西亚人估算分解神经网络不确定性的方法进行了系统研究。 方法: 我们用后方、 蒙特卡洛漏流、 深团和斯托科分解网络对贝亚斯进行了分析, 从分解精度、 概率校准、 分配外图像的不确定性以及分解质量控制等方面来看, 。 结果: 我们观察到, 深层集成的不确定性比其他方法要强。 我们发现, 后方的Bayes比噪音分解网络更强。 我们发现, 分解网络的分解不确定性与所有方法的分解准确度有关系。 随着不确定性估算的整合, 我们得以降低最低分解的分解比例, 最差的网络用户的分解率比例 78, 使用所有分解方法都显示: 低的分解法的分解方法 。