Existing studies in weakly-supervised semantic segmentation (WSSS) using image-level weak supervision have several limitations: sparse object coverage, inaccurate object boundaries, and co-occurring pixels from non-target objects. To overcome these challenges, we propose a novel framework, namely Explicit Pseudo-pixel Supervision (EPS), which learns from pixel-level feedback by combining two weak supervisions; the image-level label provides the object identity via the localization map and the saliency map from the off-the-shelf saliency detection model offers rich boundaries. We devise a joint training strategy to fully utilize the complementary relationship between both information. Our method can obtain accurate object boundaries and discard co-occurring pixels, thereby significantly improving the quality of pseudo-masks. Experimental results show that the proposed method remarkably outperforms existing methods by resolving key challenges of WSSS and achieves the new state-of-the-art performance on both PASCAL VOC 2012 and MS COCO 2014 datasets.
翻译:为克服这些挑战,我们提议了一个新框架,即Explicite Pseudo-pixel Consurance(EPS),通过结合两个薄弱的监督,从像素级反馈中学习;图像级标签通过定位图提供对象身份,而现成显著检测模型的突出特征图则提供丰富的边界。我们设计了一个联合培训战略,以充分利用两种信息之间的互补关系。我们的方法可以获取准确的物体边界,并抛弃共同生成的像素,从而大大改善假体的质量。实验结果表明,拟议方法通过解决像素级的重大挑战,并实现PASAL VOC 2012 和 MS COCO 2014 数据集的新状态性能,大大超越了现有方法。