Weakly-supervised semantic segmentation under image tags supervision is a challenging task as it directly associates high-level semantic to low-level appearance. To bridge this gap, in this paper, we propose an iterative bottom-up and top-down framework which alternatively expands object regions and optimizes segmentation network. We start from initial localization produced by classification networks. While classification networks are only responsive to small and coarse discriminative object regions, we argue that, these regions contain significant common features about objects. So in the bottom-up step, we mine common object features from the initial localization and expand object regions with the mined features. To supplement non-discriminative regions, saliency maps are then considered under Bayesian framework to refine the object regions. Then in the top-down step, the refined object regions are used as supervision to train the segmentation network and to predict object masks. These object masks provide more accurate localization and contain more regions of object. Further, we take these object masks as initial localization and mine common object features from them. These processes are conducted iteratively to progressively produce fine object masks and optimize segmentation networks. Experimental results on Pascal VOC 2012 dataset demonstrate that the proposed method outperforms previous state-of-the-art methods by a large margin.
翻译:图像标签监管下的微弱监管语义分割是一项艰巨的任务,因为它将高层次语义和低层次外观直接联系起来。 为了缩小这一差距,我们在本文件中提议了一个迭代自下而上和自上而下的框架,以扩大目标区域,优化分割网络。 我们从分类网络最初产生的本地化开始, 虽然分类网络只对小型和粗化的歧视性目标区域作出反应, 但是我们争辩说, 这些区域含有关于物体的重要共同特征。 因此, 在自下而上的步骤中, 我们从初始的本地化和扩大具有地雷特征的物体区域中挖掘共同的物体特征。 为了补充非异性区域, 我们随后在巴伊西亚框架下考虑突出的地图, 以完善目标区域。 然后在自上而下的步骤中, 改良的物体区域被用作监督者, 以训练分解网络和预测对象面具。 这些对象遮罩提供更准确的本地化和包含更多对象区域。 此外, 我们将这些物体面具作为初始本地化和地雷共同特征。 这些过程是反复进行, 以逐步生成优美的物体面面观, 并优化2012年规模的分层网络, 展示了2012年大型平面图。