Weakly supervised object localization remains an open problem due to the deficiency of finding object extent information using a classification network. While prior works struggle to localize objects by various spatial regularization strategies, we argue that how to extract object structural information from the trained classification network is neglected. In this paper, we propose a two-stage approach, termed structure-preserving activation (SPA), towards fully leveraging the structure information incorporated in convolutional features for WSOL. In the first stage, a restricted activation module (RAM) is designed to alleviate the structure-missing issue caused by the classification network, based on the observation that the unbounded classification map and global average pooling layer drive the network to focus only on object parts. In the second stage, we propose a post-process approach, termed self-correlation map generating (SCG) module to obtain structure-preserving localization maps on the basis of the activation maps acquired from the first stage. Specifically, we utilize the high-order self-correlation (HSC) to extract the inherent structural information retained in the learned model and then aggregate HSC of multiple points for precise object localization. Extensive experiments on two publicly available benchmarks including CUB-200-2011 and ILSVRC show that the proposed SPA achieves substantial and consistent performance gains compared with baseline approaches.
翻译:由于利用分类网络寻找对象范围信息的不足,薄弱的受监管对象本地化仍然是一个未解决的难题。虽然先前的工作努力通过各种空间规范化战略将物体本地化,但我们认为,如何从经过培训的分类网络中提取物体结构信息被忽视。在本文件中,我们提议采取一个两阶段办法,称为结构保护启动(SPA),以充分利用WSOL在革命性特征中所含的结构信息。在第一阶段,一个限制性的激活模块(RAM)旨在缓解分类网络造成的结构错失问题,其依据的观察是,未受约束的分类地图和全球平均集合层促使网络仅侧重于目标部分。在第二阶段,我们提议采取后进程方法,称为自我协调地图生成模块,以根据第一阶段获得的启动地图获取结构保护本地化图。具体地说,我们利用高排序的自我协调模块(HSC)来提取分类网络所保留的结构错失问题,然后将多点综合的HSC用于精确的物体本地化。我们提出后进程方法,称为自我协调地图生成模块模块(SCGG),以现有的两个持续基准为基础,包括CLS-2011年基准对比C-2011年基准。