弱监督语义分割：全局上下文和局部内容的耦合 (Coupling Global Context and Local Contents for Weakly-Supervised Semantic Segmentation)

Thanks to the advantages of the friendly annotations and the satisfactory performance, Weakly-Supervised Semantic Segmentation (WSSS) approaches have been extensively studied. Recently, the single-stage WSSS was awakened to alleviate problems of the expensive computational costs and the complicated training procedures in multi-stage WSSS. However, results of such an immature model suffer from problems of \emph{background incompleteness} and \emph{object incompleteness}. We empirically find that they are caused by the insufficiency of the global object context and the lack of the local regional contents, respectively. Under these observations, we propose a single-stage WSSS model with only the image-level class label supervisions, termed as \textbf{W}eakly-\textbf{S}upervised \textbf{F}eature \textbf{C}oupling \textbf{N}etwork (\textbf{WS-FCN}), which can capture the multi-scale context formed from the adjacent feature grids, and encode the fine-grained spatial information from the low-level features into the high-level ones. Specifically, a flexible context aggregation module is proposed to capture the global object context in different granular spaces. Besides, a semantically consistent feature fusion module is proposed in a bottom-up parameter-learnable fashion to aggregate the fine-grained local contents. Based on these two modules, \textbf{WS-FCN} lies in a self-supervised end-to-end training fashion. Extensive experimental results on the challenging PASCAL VOC 2012 and MS COCO 2014 demonstrate the effectiveness and efficiency of \textbf{WS-FCN}, which can achieve state-of-the-art results by $65.02\%$ and $64.22\%$ mIoU on PASCAL VOC 2012 \emph{val} set and \emph{test} set, $34.12\%$ mIoU on MS COCO 2014 \emph{val} set, respectively. The code and weight have been released at:~\href{https://github.com/ChunyanWang1/ws-fcn}{WS-FCN}.

翻译：感谢友好标注和令人满意的性能，弱监督语义分割（WSSS）方法得到了广泛研究。近期单阶段WSSS被提出以缓解多阶段WSSS中计算成本昂贵和训练过程复杂的问题。然而，这样一个不成熟的模型产生的结果会遭受背景不完整和物体不完整等问题困扰。我们通过经验发现，这是由全局物体上下文不足和局部区域内容缺失所引起的。考虑到这些发现，我们提出一种只使用图像级类别标签监督的单阶段WSSS模型，称为弱监督特征耦合网络(WS-FCN)。它可以捕捉由相邻特征网格形成的多尺度上下文，并将低层特征中的精细空间信息编码到高层特征中。具体而言，提出了一个灵活的上下文聚合模块，以在不同粒度空间中捕获全局物体上下文。此外，采用自下而上的可学习参数方式提出了一个语义一致的特征融合模块，以聚合精细的局部内容。基于这两个模块，WS-FCN采用自监督端到端训练。在具有挑战性的PASCAL VOC 2012和MS COCO 2014上进行了广泛的实验，表明了WS-FCN的有效性和高效性，可以在PASCAL VOC 2012的验证集和测试集上分别达到65.02%和64.22%的mIoU，以及在MS COCO 2014的验证集上达到34.12%的mIoU，显示出WS-FCN的最先进性。代码和权重已经在\href{https://github.com/ChunyanWang1/ws-fcn}{WS-FCN}上公开发布。