Classification networks have been used in weakly-supervised semantic segmentation (WSSS) to segment objects by means of class activation maps (CAMs). However, without pixel-level annotations, they are known to (1) mainly focus on discriminative regions, and (2) to produce diffuse CAMs without well-defined prediction contours. In this work, we alleviate both problems by improving CAM learning. First, we incorporate importance sampling based on the class-wise probability mass function induced by the CAMs to produce stochastic image-level class predictions. This results in segmentations that cover a larger extent of the objects, as shown in our empirical studies. Second, we formulate a feature similarity loss term, which further improves the alignment of predicted contours with edges in the image. Furthermore, we shed new light onto the problem of WSSS by measuring the contour F-score as a complement to the common area mIoU metric. We show that our method significantly outperforms previous methods in terms of contour quality, while matching state-of-the-art on region similarity.
翻译:分类网络被用于低监管的语义分解(WSSS),通过类动图(CAMs)对对象进行分解。然而,如果没有像素级说明,已知这些分类网络主要(1) 侧重于歧视区域,(2) 产生分散的CAM,没有明确界定的预测轮廓轮廓。在这项工作中,我们通过改进CAM学习来缓解这两个问题。首先,我们根据CAMs诱导的等级概率质量函数进行重要取样,以产生随机图像级预测。这导致分解,如我们的经验研究所示,覆盖了较大范围的物体。第二,我们制定了一个特征性损失术语,进一步提高了预测的等距与图像边缘的对齐。此外,我们通过测量普通区域 mIoU 测量的等距 F- 柱,为WSS 的问题提供了新的视角。我们显示,我们的方法在等距质量方面大大超越了以往的方法,同时在区域相似性上匹配了状态。