Surrogate Text Representation (STR) is a profitable solution to efficient similarity search on metric space using conventional text search engines, such as Apache Lucene. This technique is based on comparing the permutations of some reference objects in place of the original metric distance. However, the Achilles heel of STR approach is the need to reorder the result set of the search according to the metric distance. This forces to use a support database to store the original objects, which requires efficient random I/O on a fast secondary memory (such as flash-based storages). In this paper, we propose to extend the Surrogate Text Representation to specifically address a class of visual metric objects known as Vector of Locally Aggregated Descriptors (VLAD). This approach is based on representing the individual sub-vectors forming the VLAD vector with the STR, providing a finer representation of the vector and enabling us to get rid of the reordering phase. The experiments on a publicly available dataset show that the extended STR outperforms the baseline STR achieving satisfactory performance near to the one obtained with the original VLAD vectors.
翻译:代用文本代表器(STR)是利用Apache Lucene等常规文本搜索引擎对公用空间进行有效相似搜索的一个有利解决办法。这一技术的基础是比较某些参考物体的变异,以取代原始公用距离。然而,STR方法的Achilles 脚跟是需要根据公用距离重新排列搜索结果组。这种需要使用支持数据库存储原始物体的驱动力,这要求在快速的二级内存(如闪存)上高效随机I/O。在本文中,我们提议扩大代用文本代表系统,专门处理被称为本地集成描述器(VLAD)的视觉测量物体类别。这种方法的基础是代表组成VLAD矢量的单个子矢量,提供矢量的更精细的表示,并使我们能够摆脱重新排序阶段。在公开的数据集上进行的实验显示,扩展的STRAST在接近与原VLAD矢量矢量器(VLAD矢量器)获得的基线上达到令人满意的性能。