Weakly Supervised Semantic Segmentation (WSSS) is challenging, particularly when image-level labels are used to supervise pixel level prediction. To bridge their gap, a Class Activation Map (CAM) is usually generated to provide pixel level pseudo labels. CAMs in Convolutional Neural Networks suffer from partial activation ie, only the most discriminative regions are activated. Transformer based methods, on the other hand, are highly effective at exploring global context with long range dependency modeling, potentially alleviating the "partial activation" issue. In this paper, we propose the first transformer based WSSS approach, and introduce the Gradient weighted Element wise Transformer Attention Map (GETAM). GETAM shows fine scale activation for all feature map elements, revealing different parts of the object across transformer layers. Further, we propose an activation aware label completion module to generate high quality pseudo labels. Finally, we incorporate our methods into an end to end framework for WSSS using double backward propagation. Extensive experiments on PASCAL VOC and COCO demonstrate that our results beat the state-of-the-art end-to-end approaches by a significant margin, and outperform most multi-stage methods.m most multi-stage methods.
翻译:受微弱监督的语系分化( WSSS) 具有挑战性, 特别是当图像级标签被用于监督像素级预测时, 特别是当图像级标签被用于监督像素级预测时。 为了缩小差距, 通常会生成一个分类激活图( CAM) 以提供像素级假标签。 进化神经网络中的 CAM 受到部分激活, 只有最歧视的区域才会被激活。 以变异器为基础的方法在探索具有长距离依赖模型的全球背景时非常有效, 可能会减轻“ 部分激活” 问题 。 在本文中, 我们提出基于图像级的首个变异器基于 SSS 的变异器 方法, 并引入 梯度加权 Element 智能变异器注意 地图 ( GETAM ) 。 GETAM 显示所有特性图元素的精细化启动规模, 揭示变异器层的不同物体部分 。 此外, 我们提议启动一个有意识的标签完成模块模块, 以生成高品质的假标签。 最后, 我们将我们的方法纳入一个以双向后向传播的 SSS 框架的结束框架 。 对 PASAL VOC 和CO 的广泛实验显示我们的成果以最显著的多级的多级方法 。