Equipping predicted segmentation with calibrated uncertainty is essential for safety-critical applications. In this work, we focus on capturing the data-inherent uncertainty (aka aleatoric uncertainty) in segmentation, typically when ambiguities exist in input images. Due to the high-dimensional output space and potential multiple modes in segmenting ambiguous images, it remains challenging to predict well-calibrated uncertainty for segmentation. To tackle this problem, we propose a novel mixture of stochastic experts (MoSE) model, where each expert network estimates a distinct mode of the aleatoric uncertainty and a gating network predicts the probabilities of an input image being segmented in those modes. This yields an efficient two-level uncertainty representation. To learn the model, we develop a Wasserstein-like loss that directly minimizes the distribution distance between the MoSE and ground truth annotations. The loss can easily integrate traditional segmentation quality measures and be efficiently optimized via constraint relaxation. We validate our method on the LIDC-IDRI dataset and a modified multimodal Cityscapes dataset. Results demonstrate that our method achieves the state-of-the-art or competitive performance on all metrics.
翻译:在这项工作中,我们侧重于在输入图像存在模糊之处的情况下,捕捉数据内在的不确定性(可释放的不确定性),通常在输入图像存在模糊之处时。由于高维输出空间和在对模糊图像进行分解时可能存在多种模式,仍然难以预测分解的不确定性。为了解决这一问题,我们提议了一种新型的随机专家模型(MOSE)组合,每个专家网络都估计了一种截然不同的解析不确定性模式,而格子网络则预测了在这些模式中分离的输入图像的概率。这产生了一种高效的两级不确定性代表。为了了解模型,我们开发了一种瓦塞斯坦式的损失,直接将MOSE和地面真相说明之间的分布距离缩小到最小。损失很容易将传统的分解质量措施整合起来,并通过限制的放松而有效地优化。我们验证了我们关于LIDDC-IDRI数据集和经修改的多式城景数据集的方法。结果表明,我们的方法实现了所有测量的状态或竞争性业绩。</s>