The main obstacle to weakly supervised semantic image segmentation is the difficulty of obtaining pixel-level information from coarse image-level annotations. Most methods based on image-level annotations use localization maps obtained from the classifier, but these only focus on the small discriminative parts of objects and do not capture precise boundaries. FickleNet explores diverse combinations of locations on feature maps created by generic deep neural networks. It selects hidden units randomly and then uses them to obtain activation scores for image classification. FickleNet implicitly learns the coherence of each location in the feature maps, resulting in a localization map which identifies both discriminative and other parts of objects. The ensemble effects are obtained from a single network by selecting random hidden unit pairs, which means that a variety of localization maps are generated from a single image. Our approach does not require any additional training steps and only adds a simple layer to a standard convolutional neural network; nevertheless it outperforms recent comparable techniques on the Pascal VOC 2012 benchmark in both weakly and semi-supervised settings.
翻译:微弱监管的语义图像分割的主要障碍是难以从粗略图像水平的注释中获取像素级信息。 基于图像级别的注释的大多数方法都使用从分类器中获取的本地化地图,但这些方法只侧重于对象的细微区分部分,而没有精确的边界。 FickleNet在通用深神经网络创建的地貌图上探索不同位置的组合。它随机选择隐藏的单位,然后使用它们来获取图像分类的激活分数。 FickleNet隐含地学习地貌地图中每个位置的一致性,从而生成一个本地化地图,既能识别对象的歧视性部分,又能识别其他部分。通过随机选择隐藏的单位对子从单一网络中获取聚合效应,这意味着各种本地化地图都是从单一图像中生成的。我们的方法不需要任何额外的培训步骤,而只是为标准的革命神经网络添加一个简单的层;然而,它超越了Pscal VOC2012基准在弱和半超强环境中的最新可比技术。