Similarity search is a fundamental building block for information retrieval on a variety of datasets. The notion of a neighbor is often based on binary considerations, such as the k nearest neighbors. However, considering that data is often organized as a manifold with low intrinsic dimension, the notion of a neighbor must recognize higher-order relationship, to capture neighbors in all directions. Proximity graphs, such as the Relative Neighbor Graphs (RNG), use trinary relationships which capture the notion of direction and have been successfully used in a number of applications. However, the current algorithms for computing the RNG, despite widespread use, are approximate and not scalable. This paper proposes a novel type of graph, the Generalized Relative Neighborhood Graph (GRNG) for use in a pivot layer that then guides the efficient and exact construction of the RNG of a set of exemplars. It also shows how to extend this to a multi-layer hierarchy which significantly improves over the state-of-the-art methods which can only construct an approximate RNG.
翻译:相似性搜索是各种数据集信息检索的基本基石。 邻居的概念往往基于二进制的考虑, 如 k 最近的邻居。 但是, 考虑到数据往往是以低内在维度的元体组织起来的, 邻居的概念必须承认高阶关系, 以便从各个方向捕捉邻居关系。 邻近图, 如相邻图( RNG ), 使用能捕捉方向概念的三角关系, 并在一些应用中成功使用。 然而, 目前的计算 RNG 的算法尽管广泛使用, 却近似且不可缩放。 本文提出了一种新的图表类型, 即通用相对邻里图( GRNG ), 用于一个能指导 RNG 高效和精确构建一组Exemplaker 。 它还表明如何将此扩展为多层次的等级, 大大改进了只能构建近似RNG 的状态方法 。