Medical image segmentation, or computing voxelwise semantic masks, is a fundamental yet challenging task to compute a voxel-level semantic mask. To increase the ability of encoder-decoder neural networks to perform this task across large clinical cohorts, contrastive learning provides an opportunity to stabilize model initialization and enhance encoders without labels. However, multiple target objects (with different semantic meanings) may exist in a single image, which poses a problem for adapting traditional contrastive learning methods from prevalent 'image-level classification' to 'pixel-level segmentation'. In this paper, we propose a simple semantic-aware contrastive learning approach leveraging attention masks to advance multi-object semantic segmentation. Briefly, we embed different semantic objects to different clusters rather than the traditional image-level embeddings. We evaluate our proposed method on a multi-organ medical image segmentation task with both in-house data and MICCAI Challenge 2015 BTCV datasets. Compared with current state-of-the-art training strategies, our proposed pipeline yields a substantial improvement of 5.53% and 6.09% on Dice score for both medical image segmentation cohorts respectively (p-value<0.01). The performance of the proposed method is further assessed on natural images via the PASCAL VOC 2012 dataset, and achieves a substantial improvement of 2.75% on mIoU (p-value<0.01).
翻译:医学图像分解, 或计算 voxel 语义遮罩, 是计算 voxel 级的语义遮罩的基本但具有挑战性的任务。 为了提高编码器- decoder 神经神经网络在大型临床组群中执行此任务的能力, 对比式学习为稳定模型初始化和增强没有标签的编码器提供了机会。 然而, 多目标对象( 具有不同的语义含义) 可能存在于一个单一图像中, 这给将传统的对比学习方法从流行的“ 图像级别分类” 转换为“ 像素级分解” 带来问题。 在本文中, 我们提出一个简单的语义觉觉觉觉觉对比学习方法, 利用关注面罩推进多点语义的语义分解分解。 简而言之, 我们将不同的语义对象嵌入不同的组群, 而不是传统的图像级嵌入。 我们用内部数据和 MICCAI 2015 挑战 BTCVTVV 数据集, 与当前状态和像素级水平培训战略相比, 我们提议的语义学认知系统图案图解图解图解图解图段的大幅改进: 5.% 和图解图解图解图解图解图段的自然分析。