Though image-level weakly supervised semantic segmentation (WSSS) has achieved great progress with Class Activation Map (CAM) as the cornerstone, the large supervision gap between classification and segmentation still hampers the model to generate more complete and precise pseudo masks for segmentation. In this study, we explore two implicit but intuitive constraints, i.e., cross-view feature semantic consistency and intra(inter)-class compactness(dispersion), to narrow the supervision gap. To this end, we propose two novel pixel-to-prototype contrast regularization terms that are conducted cross different views and within per single view of an image, respectively. Besides, we adopt two sample mining strategies, named semi-hard prototype mining and hard pixel sampling, to better leverage hard examples while reducing incorrect contrasts caused due to the absence of precise pixel-wise labels. Our method can be seamlessly incorporated into existing WSSS models without any changes to the base network and does not incur any extra inference burden. Experiments on standard benchmark show that our method consistently improves two strong baselines by large margins, demonstrating the effectiveness of our method. Specifically, built on top of SEAM, we improve the initial seed mIoU on PASCAL VOC 2012 from 55.4% to 61.5%. Moreover, armed with our method, we increase the segmentation mIoU of EPS from 70.8% to 73.6%, achieving new state-of-the-art.
翻译:尽管图像层面监督薄弱的语义分割(WSSS)在以分类激活地图(CAM)为基石方面取得了巨大进展,但分类和分解之间的巨大监督差距仍然阻碍模型产生更完整和精确的截面假面具。在本研究中,我们探讨了两个隐含但直觉的制约因素,即交叉视图特征的语义一致性和(内部)级内部的紧凑(分散),以缩小监督差距。为此,我们提议了两个新的像素到原型对照规范条款,分别在不同观点中进行,并在图像的单一视图中进行。此外,我们采用了两个名为半硬原型采矿和硬像素取样的抽样采矿战略,以更好地利用硬体实例,同时减少由于缺少精确的像素标签而造成的不正确对比。我们的方法可以顺利地融入现有的SSS模型,而不对基本网络作出任何改变,并且不会产生任何新的推论负担。关于标准基准的实验表明,我们的方法始终以大边距改进了两个强的基线,即称为半硬体型原型采矿和硬像标,从而提高了SEAI在2012年的初部方法上提高了我们70 %的方法的有效性。