Retrieving object instances among cluttered scenes efficiently requires compact yet comprehensive regional image representations. Intuitively, object semantics can help build the index that focuses on the most relevant regions. However, due to the lack of bounding-box datasets for objects of interest among retrieval benchmarks, most recent work on regional representations has focused on either uniform or class-agnostic region selection. In this paper, we first fill the void by providing a new dataset of landmark bounding boxes, based on the Google Landmarks dataset, that includes $86k$ images with manually curated boxes from $15k$ unique landmarks. Then, we demonstrate how a trained landmark detector, using our new dataset, can be leveraged to index image regions and improve retrieval accuracy while being much more efficient than existing regional methods. In addition, we introduce a novel regional aggregated selective match kernel (R-ASMK) to effectively combine information from detected regions into an improved holistic image representation. R-ASMK boosts image retrieval accuracy substantially with no dimensionality increase, while even outperforming systems that index image regions independently. Our complete image retrieval system improves upon the previous state-of-the-art by significant margins on the Revisited Oxford and Paris datasets. Code and data available at the project webpage: https://github.com/tensorflow/models/tree/master/research/delf.
翻译:高效地在杂乱的场景中检索天体实例需要精密而全面的区域图像演示。 直观地说, 物体语义可以帮助构建以最相关区域为重点的索引。 然而, 由于缺乏关于检索基准中感兴趣的对象的捆绑框数据集, 最近关于区域代表处的工作侧重于统一或类类类的随机区域选择。 在本文中, 我们首先填补空白, 以Google Landmarks数据集为基础, 提供具有里程碑意义的捆绑框的新数据集集, 其中包括由15,000美元的独特地标组成的人工缩放框提供的86k$的86k$图像。 然后, 我们展示如何利用我们的新数据集, 将训练有素的里程碑探测器用于索引图像区域, 提高检索准确性, 而同时比现有的区域方法效率更高。 此外, 我们引入了新的区域汇总选择性匹配核心( R- ASMK ), 以将检测到区域的信息有效整合成一个整体图像代表制。 R- ASMK 提升图像采集的准确性, 且不增加规模, 甚至超越了独立索引区域的业绩系统。 我们的完整图像搜索系统, 改进了以前的数据库/ 。