Weakly supervised object localization (WSOL) aims to localize objects by only utilizing image-level labels. Class activation maps (CAMs) are the commonly used features to achieve WSOL. However, previous CAM-based methods did not take full advantage of the shallow features, despite their importance for WSOL. Because shallow features are easily buried in background noise through conventional fusion. In this paper, we propose a simple but effective Shallow feature-aware Pseudo supervised Object Localization (SPOL) model for accurate WSOL, which makes the utmost of low-level features embedded in shallow layers. In practice, our SPOL model first generates the CAMs through a novel element-wise multiplication of shallow and deep feature maps, which filters the background noise and generates sharper boundaries robustly. Besides, we further propose a general class-agnostic segmentation model to achieve the accurate object mask, by only using the initial CAMs as the pseudo label without any extra annotation. Eventually, a bounding box extractor is applied to the object mask to locate the target. Experiments verify that our SPOL outperforms the state-of-the-art on both CUB-200 and ImageNet-1K benchmarks, achieving 93.44% and 67.15% (i.e., 3.93% and 2.13% improvement) Top-5 localization accuracy, respectively.
翻译:微弱监督对象本地化( WSOL) 旨在通过仅使用图像级标签来将物体本地化。 类激活地图( CAMs) 是用来实现 WSOL 的常用特征。 然而, 先前的 CAM 方法没有充分利用浅色特征, 尽管对 WSOL 很重要 。 由于浅色特征很容易被通过常规聚合而埋在背景噪音中。 在本文中, 我们建议为精确的 WSOL 配置一个简单而有效的浅层显示低度特征的浅色点定位( SPOL) 模型。 在实践中, 我们的 SPOL 模型首先通过浅色和深层特征地图的新型元素增殖生成 CAMs, 以过滤背景噪音并产生更清晰的边界。 此外, 我们进一步提议一个普通类分解模型, 仅使用初始的 CAMs 作假标签, 而不作任何额外的说明。 最后, 对对象面罩应用捆绑框提取器, 以定位目标 。 实验中, 我们的SPOL- k- 3 和 Toper% 的本地化为 和本地化基准 。