Near neighbor search (NNS) is a powerful abstraction for data access; however, data indexing is troublesome even for approximate indexes. For intrinsically high-dimensional data, high-quality fast searches demand either indexes with impractically large memory usage or preprocessing time. In this paper, we introduce an algorithm to solve a nearest-neighbor query $q$ by minimizing a kernel function defined by the distance from $q$ to each object in the database. The minimization is performed using metaheuristics to solve the problem rapidly; even when some methods in the literature use this strategy behind the scenes, our approach is the first one using it explicitly. We also provide two approaches to select edges in the graph's construction stage that limit memory footprint and reduce the number of free parameters simultaneously. We carry out a thorough experimental comparison with state-of-the-art indexes through synthetic and real-world datasets; we found out that our contributions achieve competitive performances regarding speed, accuracy, and memory in almost any of our benchmarks.
翻译:近邻搜索( NNS) 是数据存取的强大抽象信息; 然而, 数据索引的编制甚至近似指数都存在麻烦。 对于内在高维数据, 高质量的快速搜索要求使用不切实际的大量内存使用或预处理时间的索引。 在本文中, 我们引入了一种算法来解决近邻查询$q$的问题, 通过最小化一个由美元到数据库中每个对象的距离界定的内核函数。 最小化是通过计量经济学来快速解决问题的; 即使文献中的某些方法在幕后使用这一战略, 我们的方法还是第一个明确使用。 我们还提供了两种方法, 来选择图形构造阶段的边缘, 以限制内存足迹并同时减少自由参数的数量。 我们通过合成和真实世界的数据集与最新指数进行彻底的实验性比较; 我们发现, 我们的贡献在速度、 准确性和记忆上几乎在任何一个基准中都取得了竞争性的表现 。