The Dice similarity coefficient (DSC) is both a widely used metric and loss function for biomedical image segmentation due to its robustness to class imbalance. However, it is well known that the DSC loss is poorly calibrated, resulting in overconfident predictions that cannot be usefully interpreted in biomedical and clinical practice. Performance is often the only metric used to evaluate segmentations produced by deep neural networks, and calibration is often neglected. However, calibration is important for translation into biomedical and clinical practice, providing crucial contextual information to model predictions for interpretation by scientists and clinicians. In this study, we provide a simple yet effective extension of the DSC loss, named the DSC++ loss, that selectively modulates the penalty associated with overconfident, incorrect predictions. As a standalone loss function, the DSC++ loss achieves significantly improved calibration over the conventional DSC loss across six well-validated open-source biomedical imaging datasets, including both 2D binary and 3D multi-class segmentation tasks. Similarly, we observe significantly improved calibration when integrating the DSC++ loss into four DSC-based loss functions. Finally, we use softmax thresholding to illustrate that well calibrated outputs enable tailoring of recall-precision bias, which is an important post-processing technique to adapt the model predictions to suit the biomedical or clinical task. The DSC++ loss overcomes the major limitation of the DSC loss, providing a suitable loss function for training deep learning segmentation models for use in biomedical and clinical practice. Source code is available at: https://github.com/mlyg/DicePlusPlus.
翻译:Dice 相似系数(DSC)是生物医学图像分割的一种广泛使用的衡量和损失功能,因为它具有稳健性和阶级不平衡性。然而,众所周知,DSC损失的校准不力,导致在生物医学和临床实践中无法有用解释的过度自信预测,业绩往往是用来评价深层神经网络产生的分解的唯一衡量标准,校准常常被忽视。但是,校准对于将生物医学和临床实践转化成生物医学和临床实践十分重要,为科学家和临床医生的判读提供关键的背景信息。在本研究中,我们提供了DSC损失的简单而有效的扩展,命名为DSC++损失,有选择地调整与过分自信、不正确的预测相关的惩罚。作为独立的损失函数,DSC+在六种高价值的常规DSC损失校准中,对公开源生物医学成像数据集进行了显著的校准,包括 2D binary 和 3D 多级分解任务。同样,我们观察到,当DSC+L 损失校准培训损失到四个基于DSC 的正确性临床分解标准, 的DSL 校准成本校准标准校准后,我们使用了DSC 校正的DSC 校正的DSC 校正的DLIL 。