Weakly-Supervised Semantic Segmentation (WSSS) using image-level labels typically utilizes Class Activation Map (CAM) to generate the pseudo labels. Limited by the local structure perception of CNN, CAM usually cannot identify the integral object regions. Though the recent Vision Transformer (ViT) can remedy this flaw, we observe it also brings the over-smoothing issue, \ie, the final patch tokens incline to be uniform. In this work, we propose Token Contrast (ToCo) to address this issue and further explore the virtue of ViT for WSSS. Firstly, motivated by the observation that intermediate layers in ViT can still retain semantic diversity, we designed a Patch Token Contrast module (PTC). PTC supervises the final patch tokens with the pseudo token relations derived from intermediate layers, allowing them to align the semantic regions and thus yield more accurate CAM. Secondly, to further differentiate the low-confidence regions in CAM, we devised a Class Token Contrast module (CTC) inspired by the fact that class tokens in ViT can capture high-level semantics. CTC facilitates the representation consistency between uncertain local regions and global objects by contrasting their class tokens. Experiments on the PASCAL VOC and MS COCO datasets show the proposed ToCo can remarkably surpass other single-stage competitors and achieve comparable performance with state-of-the-art multi-stage methods. Code is available at https://github.com/rulixiang/ToCo.
翻译:使用图像级标签的微弱超弱语义分解( SSSS ) 使用图像级标签通常使用类动地图( CAM ) 生成假标签。 受CNN 的当地结构认识限制, CAM 通常无法识别整体对象区域。 尽管最近的愿景变换器( VIT) 能够纠正这一缺陷, 但我们也观察到它也带来了过度移动的问题,\ i, 最终的补丁标牌是统一的。 在这项工作中, 我们建议 Token Contrast (To Co) 解决这个问题, 并进一步探讨VIT 用于 sISS 的优点。 首先,我们出于ViT 的中间层仍保留语义多样性的观察动机, 我们设计了一个 Patch Token Contrast 模块( PTC ) 。 PTC 监督最后的补丁代号与来自中间层的假象征关系, 允许它们调整语义区域, 从而产生更准确的 CAM 。 第二, 我们设计了一个等级对比模块模块( CTM), 受 Viken Contrast 模块( COR) 的优点, 因为Vart comstable) comstrual commalal Stalal deal deal demotional extitional lagistrations the the lacal lacal lacless the the lax the cal lax the sal lautts to the sal lavetical developtionalticaltical lating the sildal exticalticalticlests lablests lablests tocal latings lablests labildaltical lauts ( Vical lauts) lautusal lauts) lauts) latings) 可以在VV.</s>