Although recent advances in deep learning accelerated an improvement in a weakly supervised object localization (WSOL) task, there are still challenges to identify the entire body of an object, rather than only discriminative parts. In this paper, we propose a novel residual fine-grained attention (RFGA) module that autonomously excites the less activated regions of an object by utilizing information distributed over channels and locations within feature maps in combination with a residual operation. To be specific, we devise a series of mechanisms of triple-view attention representation, attention expansion, and feature calibration. Unlike other attention-based WSOL methods that learn a coarse attention map, having the same values across elements in feature maps, our proposed RFGA learns fine-grained values in an attention map by assigning different attention values for each of the elements. We validated the superiority of our proposed RFGA module by comparing it with the recent methods in the literature over three datasets. Further, we analyzed the effect of each mechanism in our RFGA and visualized attention maps to get insights.
翻译:虽然在深层学习方面最近的进展加快了对受微弱监督的物体定位任务(WSOL)的改进,但在查明物体的整个体而不是仅仅区分部分方面仍然存在挑战。在本文件中,我们提出一个新的残余细微关注模块,通过利用通过频道和地点在特征地图内传播的信息,同时结合残余操作,自主地激发物体中较不活跃的区域。具体地说,我们设计了一系列三眼关注代表、注意力扩大和特征校准机制。与其他基于注意的WSOL方法不同,它学习粗糙的注意地图,在特征地图中各有相同的值,我们提议的RFGA在关注地图中学习细微的值,对每个元素给予不同的注意值。我们验证了我们提议的RFGA模块的优越性,将它与文献中三个数据集的最新方法进行了比较。此外,我们分析了我们RFGA中每个机制的效果,以及可视化关注地图以获得洞察力。