Contrastive learning has shown superior performance in embedding global and spatial invariant features in computer vision (e.g., image classification). However, its overall success of embedding local and spatial variant features is still limited, especially for semantic segmentation. In a per-pixel prediction task, more than one label can exist in a single image for segmentation (e.g., an image contains both cat, dog, and grass), thereby it is difficult to define 'positive' or 'negative' pairs in a canonical contrastive learning setting. In this paper, we propose an attention-guided supervised contrastive learning approach to highlight a single semantic object every time as the target. With our design, the same image can be embedded to different semantic clusters with semantic attention (i.e., coerce semantic masks) as an additional input channel. To achieve such attention, a novel two-stage training strategy is presented. We evaluate the proposed method on multi-organ medical image segmentation task, as our major task, with both in-house data and BTCV 2015 datasets. Comparing with the supervised and semi-supervised training state-of-the-art in the backbone of ResNet-50, our proposed pipeline yields substantial improvement of 5.53% and 6.09% in Dice score for both medical image segmentation cohorts respectively. The performance of the proposed method on natural images is assessed via PASCAL VOC 2012 dataset, and achieves 2.75% substantial improvement.
翻译:对比性学习在将全球和空间差异性特征嵌入计算机视觉(例如图像分类)中表现出优异性,在将全球和空间差异性特征嵌入计算机视觉(例如图像分类)方面表现优异。然而,在嵌入本地和空间变异性特征的总体成功率仍然有限,特别是在语义分化方面。在每像素的预测任务中,一个以上的标签可以存在于一个单一的分化图像中(例如图像包含猫、狗和草),因此很难在一个能动对比的学习环境中定义“正性”或“负性”对等。在本文中,我们建议采取关注引导对比性对照性对照性学习方法,以便每次突出显示一个单一的语义和空间变异性特征对象。在我们的设计中,同一图像可以嵌入不同的语义类组(例如,强迫性语义面具)作为额外的输入渠道。为了实现这种注意,我们提出了一个新的两阶段培训战略。我们的主要任务是,在内部数据和BTCVV2015数据流化数据结构图解中分别实现监控和排序图象学中的拟议比例分析。