Weakly supervised learning has emerged as an appealing alternative to alleviate the need for large labeled datasets in semantic segmentation. Most current approaches exploit class activation maps (CAMs), which can be generated from image-level annotations. Nevertheless, resulting maps have been demonstrated to be highly discriminant, failing to serve as optimal proxy pixel-level labels. We present a novel learning strategy that leverages self-supervision in a multi-modal image scenario to significantly enhance original CAMs. In particular, the proposed method is based on two observations. First, the learning of fully-supervised segmentation networks implicitly imposes equivariance by means of data augmentation, whereas this implicit constraint disappears on CAMs generated with image tags. And second, the commonalities between image modalities can be employed as an efficient self-supervisory signal, correcting the inconsistency shown by CAMs obtained across multiple modalities. To effectively train our model, we integrate a novel loss function that includes a within-modality and a cross-modality equivariant term to explicitly impose these constraints during training. In addition, we add a KL-divergence on the class prediction distributions to facilitate the information exchange between modalities, which, combined with the equivariant regularizers further improves the performance of our model. Exhaustive experiments on the popular multi-modal BRATS dataset demonstrate that our approach outperforms relevant recent literature under the same learning conditions.
翻译:微弱监管的学习已成为一个减轻在语义区块中大型标签式数据集需求的诱人替代方案。 多数当前方法利用了可以通过图像级别注释生成的类流激活地图(CAMs ) 。 然而, 由此形成的地图已证明极不相容, 无法充当最佳代理像素级标签。 我们提出了一个新颖的学习战略, 在多模式图像情景中利用自我监督观点, 以显著增强原始 CAM 。 特别是, 拟议的方法基于两种观察。 首先, 学习完全监督的分解网络会通过数据扩增手段隐含着不均匀性, 而这种隐含的制约会消失在通过图像级别标记生成的类流中。 其次, 图像模式之间的共性可以被用作高效的自我监督信号, 纠正 CAMs 跨多种模式所显示的不一致性。 为了有效地培训我们的模型, 我们整合了一个新损失函数, 其中包括一种相同的内部方式和交叉模式, 从而在培训期间明确施加这些制约。 此外, 我们用类流化的实验模式中, 我们增加了一种常规化的模型, 将改进了我们之间的数据分配方式 。