Weakly-Supervised Semantic Segmentation (WSSS) methods with image-level labels generally train a classification network to generate the Class Activation Maps (CAMs) as the initial coarse segmentation labels. However, current WSSS methods still perform far from satisfactorily because their adopted CAMs 1) typically focus on partial discriminative object regions and 2) usually contain useless background regions. These two problems are attributed to the sole image-level supervision and aggregation of global information when training the classification networks. In this work, we propose the visual words learning module and hybrid pooling approach, and incorporate them in the classification network to mitigate the above problems. In the visual words learning module, we counter the first problem by enforcing the classification network to learn fine-grained visual word labels so that more object extents could be discovered. Specifically, the visual words are learned with a codebook, which could be updated via two proposed strategies, i.e. learning-based strategy and memory-bank strategy. The second drawback of CAMs is alleviated with the proposed hybrid pooling, which incorporates the global average and local discriminative information to simultaneously ensure object completeness and reduce background regions. We evaluated our methods on PASCAL VOC 2012 and MS COCO 2014 datasets. Without any extra saliency prior, our method achieved 70.6% and 70.7% mIoU on the $val$ and $test$ set of PASCAL VOC dataset, respectively, and 36.2% mIoU on the $val$ set of MS COCO dataset, which significantly surpassed the performance of state-of-the-art WSSS methods.
翻译:36. 有图像级标签的微弱超弱语系分割法(WSSS)与图像级标签的图像级分割法(WSSS)通常对一个分类网络进行培训,以生成分类激活图(CAMs),作为初始粗化的分解标签。然而,目前的SSS方法仍然远远不能令人满意,因为其采用的CAMs 1 通常侧重于局部歧视对象区域,2 通常包含无用的背景区域。这两个问题都归因于在培训分类网络时仅以图像级监督和汇总全球信息。在这项工作中,我们提议视觉词学习模块和混合集合方法,并将其纳入分类网络,以缓解上述问题。在视觉词学习模块中,我们通过实施分类网络学习精细化的视觉字组,学习更多的对象范围。具体地说,视觉词是用一个代码书学习,可以通过两个拟议战略更新,即学习战略和记忆级数据库战略。CAMs的第二个后退缩部分是拟议混合集合,其中纳入了全球平均值和当地分析性值信息,以缓解上述问题。在2012年将我们全球平均和地方上达到的数值的数值级的数值级CALML数据方法,同时评估了我们先前的70个目标值数据集,并缩小了我们的数据区域。