In this paper, we propose Suppression-Enhancing Mask based attention and Interactive Channel transformatiON (SEMICON) to learn binary hash codes for dealing with large-scale fine-grained image retrieval tasks. In SEMICON, we first develop a suppression-enhancing mask (SEM) based attention to dynamically localize discriminative image regions. More importantly, different from existing attention mechanism simply erasing previous discriminative regions, our SEM is developed to restrain such regions and then discover other complementary regions by considering the relation between activated regions in a stage-by-stage fashion. In each stage, the interactive channel transformation (ICON) module is afterwards designed to exploit correlations across channels of attended activation tensors. Since channels could generally correspond to the parts of fine-grained objects, the part correlation can be also modeled accordingly, which further improves fine-grained retrieval accuracy. Moreover, to be computational economy, ICON is realized by an efficient two-step process. Finally, the hash learning of our SEMICON consists of both global- and local-level branches for better representing fine-grained objects and then generating binary hash codes explicitly corresponding to multiple levels. Experiments on five benchmark fine-grained datasets show our superiority over competing methods.
翻译:在本文中,我们提议以关注和互动频道变换(SEMICON)为主,学习用于处理大规模微微微图像检索任务的二进制散列码。在SEMICON中,我们首先开发了一种基于动态定位分析图像区域的强化面具(SEM),更重要的是,与现有的关注机制不同,只是消除了先前的歧视性区域,我们的SEM发展了限制这些区域,然后通过分阶段考虑被激活区域之间的关系来发现其他互补区域。在每一个阶段,互动式信道变换(ICON)模块随后设计为利用所参与的振动高压器各频道的相互关系。由于频道一般可以与微粒对象的部分相对应,部分相关部分也可以据此建模,从而进一步提高微微增分检索准确性。此外,作为计算经济,ICON是通过一个高效的两步进程实现的。最后,我们SEMICON的学习由全球和地方层面的分支组成,以更好地代表精细物体,然后生成双级级实验,从而明确显示我们五级基准数据。