The task of image-level weakly-supervised semantic segmentation (WSSS) has gained popularity in recent years, as it reduces the vast data annotation cost for training segmentation models. The typical approach for WSSS involves training an image classification network using global average pooling (GAP) on convolutional feature maps. This enables the estimation of object locations based on class activation maps (CAMs), which identify the importance of image regions. The CAMs are then used to generate pseudo-labels, in the form of segmentation masks, to supervise a segmentation model in the absence of pixel-level ground truth. In case of the SEAM baseline, a previous work proposed to improve CAM learning in two ways: (1) Importance sampling, which is a substitute for GAP, and (2) the feature similarity loss, which utilizes a heuristic that object contours almost exclusively align with color edges in images. In this work, we propose a different probabilistic interpretation of CAMs for these techniques, rendering the likelihood more appropriate than the multinomial posterior. As a result, we propose an add-on method that can boost essentially any previous WSSS method, improving both the region similarity and contour quality of all implemented state-of-the-art baselines. This is demonstrated on a wide variety of baselines on the PASCAL VOC dataset. Experiments on the MS COCO dataset show that performance gains can also be achieved in a large-scale setting. Our code is available at https://github.com/arvijj/hfpl.
翻译:近年来,基于图像级别的弱监督语义分割(WSSS)任务变得越来越受欢迎,因为它减少了用于训练分割模型的大量数据注释成本。WSSS的典型方法涉及对卷积特征图使用全局平均池化(GAP)来训练图像分类网络。这样可以基于类激活图(CAM)来估计物体位置,CAM可以识别图像区域的重要性。然后,CAM用于生成伪标签,以分割掩模的形式监督分割模型,在像素级地没有地面实况的情况下完成监督。在SEAM基线的情况下,之前的工作提出了两种方法来改善CAM学习,分别是(1)重要性采样,它是GAP的替代品,以及(2)特征相似性损失,它利用启发式对象轮廓几乎全部与图像中的颜色边缘对齐这个想法。在这项工作中,我们提出了CAM的不同概率解释,使得 CAM 的似然比多项式后验更合适。因此,我们提出了一个附加方法,它可以提高基本任何之前的 WSSS 方法,提高了所有实现的最先进基线的区域相似性和轮廓质量。在 PASCAL VOC 数据集上展示了这一点。在 MS COCO 数据集上的实验证明,也可以在大规模场景下实现性能提升。我们的代码可在 https://github.com/arvijj/hfpl 中获取。