Model calibration measures the agreement between the predicted probability estimates and the true correctness likelihood. Proper model calibration is vital for high-risk applications. Unfortunately, modern deep neural networks are poorly calibrated, compromising trustworthiness and reliability. Medical image segmentation particularly suffers from this due to the natural uncertainty of tissue boundaries. This is exasperated by their loss functions, which favor overconfidence in the majority classes. We address these challenges with DOMINO, a domain-aware model calibration method that leverages the semantic confusability and hierarchical similarity between class labels. Our experiments demonstrate that our DOMINO-calibrated deep neural networks outperform non-calibrated models and state-of-the-art morphometric methods in head image segmentation. Our results show that our method can consistently achieve better calibration, higher accuracy, and faster inference times than these methods, especially on rarer classes. This performance is attributed to our domain-aware regularization to inform semantic model calibration. These findings show the importance of semantic ties between class labels in building confidence in deep learning models. The framework has the potential to improve the trustworthiness and reliability of generic medical image segmentation models. The code for this article is available at: https://github.com/lab-smile/DOMINO.
翻译:模型校准测量了预测概率估计和真实正确可能性之间的协议。 正确的模型校准对于高风险应用至关重要。 不幸的是, 现代深神经网络校准不周, 降低了可信度和可靠性。 医学图像分割因组织界限的自然不确定性而特别受到影响。 这因损失功能而激怒, 偏重于多数类的自信。 我们与DOMINO一起应对这些挑战。 DOMINO是一个域认知模型校准方法, 利用语义调调和等级相似的等级标签。 我们的实验显示, DOMINO 校准的深神经网络超越了不校准模型和头部图像分割中最先进的光学方法。 我们的结果表明,我们的方法可以持续地实现更好的校准、 更高的准确度和比这些方法更快的推断时间, 特别是在稀有类中。 这种表现归功于我们的域识校准规范, 以告知语义模型校准。 这些研究结果显示, 班级标签在建立深层学习模型中, 校准模型中, 校准模型的重要性。 这个框架可以提高常规/ 格式的可靠性 。