Semantic segmentation has made significant progress in recent years thanks to deep neural networks, but the common objective of generating a single segmentation output that accurately matches the image's content may not be suitable for safety-critical domains such as medical diagnostics and autonomous driving. Instead, multiple possible correct segmentation maps may be required to reflect the true distribution of annotation maps. In this context, stochastic semantic segmentation methods must learn to predict conditional distributions of labels given the image, but this is challenging due to the typically multimodal distributions, high-dimensional output spaces, and limited annotation data. To address these challenges, we propose a conditional categorical diffusion model (CCDM) for semantic segmentation based on Denoising Diffusion Probabilistic Models. Our model is conditioned to the input image, enabling it to generate multiple segmentation label maps that account for the aleatoric uncertainty arising from divergent ground truth annotations. Our experimental results show that CCDM achieves state-of-the-art performance on LIDC, a stochastic semantic segmentation dataset, and outperforms established baselines on the classical segmentation dataset Cityscapes.
翻译:摘要:近年来,由于深度神经网络的发展,语义分割取得了重要的进展。但是,在医学诊断和自动驾驶等安全关键领域中,生成一个单一的准确匹配图像内容的分割输出可能并不适用。相反,可能需要多个可能正确的分割图像,以反映注释图像的真实分布。在这种情况下,随机语义分割方法必须学习预测给定图像的标签的条件分布,但这很具有挑战性,由于通常是多模态分布,高维输出空间和有限注释数据。为了应对这些挑战,我们提出了一种基于去噪扩散概率模型的条件分类扩散模型(CCDM),用于语义分割。我们的模型受输入图像的条件限制,从而使其能够生成多个分割标签映射,以考虑到源于不同标注地面真值的车应性不确定性。实验结果表明,CCDM 在随机语义分割数据集 LIDC 上实现了最先进的性能,并在传统分割数据集 Cityscapes 上优于已经建立的基线。