The clinical interest is often to measure the volume of a structure, which is typically derived from a segmentation. In order to evaluate and compare segmentation methods, the similarity between a segmentation and a predefined ground truth is measured using popular discrete metrics, such as the Dice score. Recent segmentation methods use a differentiable surrogate metric, such as soft Dice, as part of the loss function during the learning phase. In this work, we first briefly describe how to derive volume estimates from a segmentation that is, potentially, inherently uncertain or ambiguous. This is followed by a theoretical analysis and an experimental validation linking the inherent uncertainty to common loss functions for training CNNs, namely cross-entropy and soft Dice. We find that, even though soft Dice optimization leads to an improved performance with respect to the Dice score and other measures, it may introduce a volume bias for tasks with high inherent uncertainty. These findings indicate some of the method's clinical limitations and suggest doing a closer ad-hoc volume analysis with an optional re-calibration step.
翻译:临床兴趣往往在于测量结构的体积,这种结构通常是从分解中产生的。为了评估和比较分解方法,利用流行的离散测量标准,例如Dice评分,衡量分解和预设地面真理之间的相似性。最近的分解方法使用一种不同的代谢度,例如软骰子,作为学习阶段损失函数的一部分。在这项工作中,我们首先简要地描述如何从分解中得出体积估计,这种分解可能具有内在的不确定性或模糊性。随后进行理论分析和实验性验证,将内在的不确定性与培训CNN的常见损失函数,即交叉授精性和软骰子联系起来。我们发现,即使软骰子优化能够改善Dice分数和其他计量的性能,但对于具有高度内在不确定性的任务,它也可能造成数量偏差。这些调查结果表明该方法的一些临床局限性,并建议用可选的再校准步骤进行更接近的量分析。