Weakly supervised object localization (WSOL) relaxes the requirement of dense annotations for object localization by using image-level classification masks to supervise its learning process. However, current WSOL methods suffer from excessive activation of background locations and need post-processing to obtain the localization mask. This paper attributes these issues to the unawareness of background cues, and propose the background-aware classification activation map (B-CAM) to simultaneously learn localization scores of both object and background with only image-level labels. In our B-CAM, two image-level features, aggregated by pixel-level features of potential background and object locations, are used to purify the object feature from the object-related background and to represent the feature of the pure-background sample, respectively. Then based on these two features, both the object classifier and the background classifier are learned to determine the binary object localization mask. Our B-CAM can be trained in end-to-end manner based on a proposed stagger classification loss, which not only improves the objects localization but also suppresses the background activation. Experiments show that our B-CAM outperforms one-stage WSOL methods on the CUB-200, OpenImages and VOC2012 datasets.
翻译:微弱监督对象本地化( WSOL) 通过使用图像级分类面罩监督其学习过程,放松了对对象本地化的密集说明要求; 然而, 目前的 WSOL 方法因背景位置的过度激活而受到影响, 需要后处理才能获得本地化掩码。 本文将这些问题归因于背景线索的未知性, 并提议背景意识分类缩放图( B- CAM), 以同时学习对象和背景的本地化分数, 仅使用图像级标签。 在我们的 B- CAM 中, 有两个图像级功能, 由潜在背景和对象位置的像素级特征汇总, 用来从对象相关背景中净化对象特征, 并分别代表纯背景样本的特征。 随后, 基于这两个特征, 对象分类仪和背景分类仪都学会确定二进制对象本地化遮罩 。 我们的 B- CAM 可以在拟议断端到端方式上的培训, 不仅改进对象的本地化, 而且还抑制背景活动的背景特性。 实验显示我们B- C- CAM 和 VWAS 系统 一 的 C- C- C- C- CAMS- Sable 系统 显示 C- C- C- C- C- C- C- C- CSet 系统 的 方法