Retrieving object instances among cluttered scenes efficiently requires compact yet comprehensive regional image representations. Intuitively, object semantics can help build the index that focuses on the most relevant regions. However, due to the lack of bounding-box datasets for objects of interest among retrieval benchmarks, most recent work on regional representations has focused on either uniform or class-agnostic region selection. In this paper, we first fill the void by providing a new dataset of landmark bounding boxes, based on the Google Landmarks dataset, that includes $94k$ images with manually curated boxes from $15k$ unique landmarks. Then, we demonstrate how a trained landmark detector, using our new dataset, can be leveraged to index image regions and improve retrieval accuracy while being much more efficient than existing regional methods. In addition, we further introduce a novel regional aggregated selective match kernel (R-ASMK) to effectively combine information from detected regions into an improved holistic image representation. R-ASMK boosts image retrieval accuracy substantially at no additional memory cost, while even outperforming systems that index image regions independently. Our complete image retrieval system improves upon the previous state-of-the-art by significant margins on the Revisited Oxford and Paris datasets. Code and data will be released.
翻译:高效地在乱成一团的场景中检索对象实例需要精密而全面的区域图像演示。 自然, 对象语义可以帮助构建以最相关区域为重点的索引。 但是, 由于缺乏关于检索基准中感兴趣的对象的捆绑框数据集, 最近关于区域代表的工作侧重于统一或类级不可知的区域选择。 在本文中, 我们首先填补空白, 以Google Landmarks数据集为基础, 提供具有里程碑意义的捆绑框的新数据集, 其中包括由15,000美元的独特地标组成的手动包装盒组成的94k亿美元图像。 然后, 我们展示如何利用我们的新数据集, 将经过训练的里程碑探测器用于索引图像区域, 提高检索准确性, 而同时比现有的区域方法效率更高。 此外, 我们进一步引入了新型的区域汇总选择匹配核心( R- ASMK ), 以将检测到的区域的信息有效整合成一个整体图像代表制。 R- ASMK 将图像检索精度大幅提升图像的准确性, 从而独立地超越了索引图像区域的业绩系统。