Modern incremental learning for semantic segmentation methods usually learn new categories based on dense annotations. Although achieve promising results, pixel-by-pixel labeling is costly and time-consuming. Weakly incremental learning for semantic segmentation (WILSS) is a novel and attractive task, which aims at learning to segment new classes from cheap and widely available image-level labels. Despite the comparable results, the image-level labels can not provide details to locate each segment, which limits the performance of WILSS. This inspires us to think how to improve and effectively utilize the supervision of new classes given image-level labels while avoiding forgetting old ones. In this work, we propose a novel and data-efficient framework for WILSS, named FMWISS. Specifically, we propose pre-training based co-segmentation to distill the knowledge of complementary foundation models for generating dense pseudo labels. We further optimize the noisy pseudo masks with a teacher-student architecture, where a plug-in teacher is optimized with a proposed dense contrastive loss. Moreover, we introduce memory-based copy-paste augmentation to improve the catastrophic forgetting problem of old classes. Extensive experiments on Pascal VOC and COCO datasets demonstrate the superior performance of our framework, e.g., FMWISS achieves 70.7% and 73.3% in the 15-5 VOC setting, outperforming the state-of-the-art method by 3.4% and 6.1%, respectively.
翻译:现代增量学习的语义分割方法通常基于密集注释学习新类别。虽然取得了有希望的结果,但逐像素标记是昂贵和耗时的。弱增量学习语义分割(WILSS)是一项新颖且有吸引力的任务,旨在学习从廉价且广泛可用的图像级别标签中对新类别进行分割。虽然可以获得可比较的结果,但图像级别标签无法提供定位每个段落所需的细节,这限制了WILSS的性能。这启发我们思考如何在避免忘记旧类别的同时,提高并有效利用给出新类别的图像级别标签的监督。在这项工作中,我们提出了一种新颖的、数据有效的WILSS框架,命名为 FMWISS。具体而言,我们提出了基于预训练的协同分割,以提取互补基础模型的知识,生成密集的伪标签。我们进一步通过教师-学生架构优化嘈杂的伪掩模,其中插件教师通过提出的密集对比损失进行优化。此外,我们引入了基于记忆的复制 - 粘贴增强来改善旧类别的灾难性遗忘问题。对帕斯卡VOC和COCO数据集的广泛实验证明了我们框架的卓越性能,例如FMWISS在15-5 VOC设置中达到了70.7%和73.3%,分别比现有最优方法高出3.4%和6.1%。