Recent mainstream weakly supervised semantic segmentation (WSSS) approaches are mainly based on Class Activation Map (CAM) generated by a CNN (Convolutional Neural Network) based image classifier. In this paper, we propose a novel transformer-based framework, named Semantic Guided Activation Transformer (SemFormer), for WSSS. We design a transformer-based Class-Aware AutoEncoder (CAAE) to extract the class embeddings for the input image and learn class semantics for all classes of the dataset. The class embeddings and learned class semantics are then used to guide the generation of activation maps with four losses, i.e., class-foreground, class-background, activation suppression, and activation complementation loss. Experimental results show that our SemFormer achieves \textbf{74.3}\% mIoU and surpasses many recent mainstream WSSS approaches by a large margin on PASCAL VOC 2012 dataset. Code will be available at \url{https://github.com/JLChen-C/SemFormer}.
翻译:最近主流监管薄弱的语义分隔法(SSSS)主要基于基于CNN(革命神经网络)图像分类分类活化图(CAM ) 。 在本文中,我们为SSS提出一个新的基于变压器的框架,名为Semantic 向导活化变异器(Semformer ) 。 我们设计了一个基于变压器的Aware AutoEccoder(CAAE), 以提取输入图像的分类嵌入和学习数据集所有类别的类类语义。 然后, 类嵌入和学习类语义用于指导具有四种损失的活化图的生成, 即: 类前方、 类后方、 激活抑制和激活补充损失。 实验结果表明, 我们的SemFormers通过 PASAL VOC 2012 数据集的较大边距, 通过 PASACAL VOC 数据集, 代码将可以在\ url{https://github.com/ JLC/Shemmer}- Somemem.