Weakly supervised object localization is a challenging task which aims to localize objects with coarse annotations such as image categories. Existing deep network approaches are mainly based on class activation map, which focuses on highlighting discriminative local region while ignoring the full object. In addition, the emerging transformer-based techniques constantly put a lot of emphasis on the backdrop that impedes the ability to identify complete objects. To address these issues, we present a re-attention mechanism termed token refinement transformer (TRT) that captures the object-level semantics to guide the localization well. Specifically, TRT introduces a novel module named token priority scoring module (TPSM) to suppress the effects of background noise while focusing on the target object. Then, we incorporate the class activation map as the semantically aware input to restrain the attention map to the target object. Extensive experiments on two benchmarks showcase the superiority of our proposed method against existing methods with image category annotations. Source code is available in \url{https://github.com/su-hui-zz/ReAttentionTransformer}.
翻译:微弱监督对象本地化是一项具有挑战性的任务,它旨在将图像类别等粗劣的注释说明对象本地化。现有的深层次网络方法主要基于类级激活地图,侧重于突出歧视性的局部区域,而忽略了全部对象。此外,新兴的变压器技术不断大量强调妨碍识别完整对象能力的背景。为解决这些问题,我们提出了一个称为象征性改进变异器的再留置机制(TRT),它捕捉目标级的语义来引导本地化井井。具体地说,TRT引入了名为象征性优先评分模块(TPSM)的新颖模块,以压制背景噪音的影响,同时聚焦目标对象。然后,我们将类激活地图作为有语义意识的投入,以限制对目标对象的注意。在两个基准上进行广泛的实验,展示我们拟议方法相对于现有图类说明的优势。源代码可在\url{https://github.com/su-hui-zz/ReAttrantectionerfort}。