The Dice similarity coefficient (DSC) is both a widely used metric and loss function for biomedical image segmentation due to its robustness to class imbalance. However, it is well known that the DSC loss is poorly calibrated, resulting in overconfident predictions that cannot be usefully interpreted in biomedical and clinical practice. Performance is often the only metric used to evaluate segmentations produced by deep neural networks, and calibration is often neglected. However, calibration is important for translation into biomedical and clinical practice, providing crucial contextual information to model predictions for interpretation by scientists and clinicians. In this study, we identify poor calibration as an emerging challenge of deep learning based biomedical image segmentation. We provide a simple yet effective extension of the DSC loss, named the DSC++ loss, that selectively modulates the penalty associated with overconfident, incorrect predictions. As a standalone loss function, the DSC++ loss achieves significantly improved calibration over the conventional DSC loss across five well-validated open-source biomedical imaging datasets. Similarly, we observe significantly improved when integrating the DSC++ loss into four DSC-based loss functions. Finally, we use softmax thresholding to illustrate that well calibrated outputs enable tailoring of precision-recall bias, an important post-processing technique to adapt the model predictions to suit the biomedical or clinical task. The DSC++ loss overcomes the major limitation of the DSC, providing a suitable loss function for training deep learning segmentation models for use in biomedical and clinical practice.
翻译:Dice相似系数(DSC)是生物医学图像分割的一种广泛使用的衡量和损失功能,因为它具有稳健性和阶级不平衡性,但众所周知,DSC损失的校准不准确,导致生物医学和临床实践无法有用解释的过度自信预测,性能往往是用于评价深层神经网络产生的分解和校准的唯一衡量标准,而校准往往被忽视。然而,校准对于将生物医学和临床实践转化成生物医学和临床实践十分重要,为科学家和临床医生解释的模型预测提供了重要的背景信息。在这项研究中,我们发现校准差是深层次学习基于生物医学图像分割的新挑战。我们简单而有效地扩展DSC损失的DSC损失,称为DSC++损失,有选择地调整了与过分偏执、不正确的预测相关的惩罚。作为独立损失函数,DSC++损失在五种经过充分验证的公开源生物医学成像数据集中,对常规DSC损失的校准提供了重要的背景信息。同样,我们注意到,当将DSC++损失模型纳入四种基于生物医学的深层次分类分类的临床图像分割分析分析模型时,我们发现显著改进了将DSC的DSC损失模型作为DSC的校正标准的校准的校正后,我们使用一个软的校正的校正的校正的校正的校正的校正。