We present a two-stage learning framework for weakly supervised object localization (WSOL). While most previous efforts rely on high-level feature based CAMs (Class Activation Maps), this paper proposes to localize objects using the low-level feature based activation maps. In the first stage, an activation map generator produces activation maps based on the low-level feature maps in the classifier, such that rich contextual object information is included in an online manner. In the second stage, we employ an evaluator to evaluate the activation maps predicted by the activation map generator. Based on this, we further propose a weighted entropy loss, an attentive erasing, and an area loss to drive the activation map generator to substantially reduce the uncertainty of activations between object and background, and explore less discriminative regions. Based on the low-level object information preserved in the first stage, the second stage model gradually generates a well-separated, complete, and compact activation map of object in the image, which can be easily thresholded for accurate localization. Extensive experiments on CUB-200-2011 and ImageNet-1K datasets show that our framework surpasses previous methods by a large margin, which sets a new state-of-the-art for WSOL.
翻译:在第一阶段,一个启动式地图生成器根据分类器中低级别特征地图制作启动地图,从而以在线方式纳入丰富的背景物体信息。在第二阶段,我们聘请一名评价员来评价启动式地图生成器所预测的启动式地图。在此基础上,我们进一步提议使用基于高等级特征的CAMM(Class 启动地图)的加权增缩、仔细的删除和地区损失来驱动启动式地图生成器,以大幅降低在对象和背景之间启动的不确定性,并探索较少歧视的区域。在第一阶段,一个启动式地图生成器根据在分类器中保存的低级别特征地图制作启动式地图,从而以在线方式将丰富的背景物体信息包含在内。在第二阶段,我们聘请一名评价员来评估启动式地图生成器所预测的启动式地图。在此基础上,我们进一步提议使用一个加权增缩缩缩微缩微缩微缩微缩微胶图,在CUB-200-2011和图像网络-1K数据集方面进行广泛的实验,以新的标准比值取代了我们的框架。