Large-scale trademark retrieval is an important content-based image retrieval task. A recent study shows that off-the-shelf deep features aggregated with Regional-Maximum Activation of Convolutions (R-MAC) achieve state-of-the-art results. However, R-MAC suffers in the presence of background clutter/trivial regions and scale variance, and discards important spatial information. We introduce three simple but effective modifications to R-MAC to overcome these drawbacks. First, we propose the use of both sum and max pooling to minimise the loss of spatial information. We also employ domain-specific unsupervised soft-attention to eliminate background clutter and unimportant regions. Finally, we add multi-resolution inputs to enhance the scale-invariance of R-MAC. We evaluate these three modifications on the million-scale METU dataset. Our results show that all modifications bring non-trivial improvements, and surpass previous state-of-the-art results.
翻译:大规模商标检索是一项重要的基于内容的图像检索任务。最近的一项研究显示,与区域-最大动动革命(R-MAC)相加的现成深层特征取得了最新的结果,然而,R-MAC在背景偏差/三角区域和规模差异中受害,并抛弃了重要的空间信息。我们对R-MAC进行了三项简单而有效的修改,以克服这些缺陷。首先,我们提议使用总和和最大集合来尽量减少空间信息的损失。我们还采用特定领域、不受监督的软意图来消除背景交错和无关紧要的区域。最后,我们增加了多分辨率投入,以加强R-MAC的变换规模。我们评估了百万个规模的METU数据集的这三项修改。我们的结果显示,所有修改都带来了非三重的改进,超过了以往的最新结果。