Weakly Supervised Object Localization (WSOL) methods generate both classification and localization results by learning from only image category labels. Previous methods usually utilize class activation map (CAM) to obtain target object regions. However, most of them only focus on improving foreground object parts in CAM, but ignore the important effect of its background contents. In this paper, we propose a confidence segmentation (ConfSeg) module that builds confidence score for each pixel in CAM without introducing additional hyper-parameters. The generated sample-specific confidence mask is able to indicate the extent of determination for each pixel in CAM, and further supervises additional CAM extended from internal feature maps. Besides, we introduce Co-supervised Augmentation (CoAug) module to capture feature-level representation for foreground and background parts in CAM separately. Then a metric loss is applied at batch sample level to augment distinguish ability of our model, which helps a lot to localize more related object parts. Our final model, CSoA, combines the two modules and achieves superior performance, e.g. $37.69\%$ and $48.81\%$ Top-1 localization error on CUB-200 and ILSVRC datasets, respectively, which outperforms all previous methods and becomes the new state-of-the-art.
翻译:微弱监督对象本地化( WSOL) 方法通过只从图像分类标签学习生成分类和本地化结果。 以往的方法通常使用类激活图( CAM) 获取目标目标区域。 但是, 多数方法只侧重于改进 CAM 的前景对象部件, 忽视其背景内容的重要影响 。 在本文中, 我们提议了一个信任分割模块( ConfSeg), 为 CAM 中每个像素建立信任分, 但不引入额外的超参数。 生成的样本特定信任面罩能够显示 CAM 中每个像素的确定范围, 并进一步监督内部特征地图扩展的更多 CAM 。 此外, 我们引入了共同监督的增强( CoAug) 模块, 为 CAM 的地面和背景部分分别获取地平级代表。 然后在批量样本级别上应用一个衡量损失的模块, 以提升我们模型的辨别能力, 有助于将更多相关对象部分本地化。 我们的最后一个模型( CSOOA) 组合两个模块, 并实现高级性性表现, 如 37- 69$ 和 Toproforal- glas 和 Top- group- gard 。