Existing approximate nearest neighbor search systems suffer from two fundamental problems that are of practical importance but have not received sufficient attention from the research community. First, although existing systems perform well for the whole database, it is difficult to run a search over a subset of the database. Second, there has been no discussion concerning the performance decrement after many items have been newly added to a system. We develop a reconfigurable inverted index (Rii) to resolve these two issues. Based on the standard IVFADC system, we design a data layout such that items are stored linearly. This enables us to efficiently run a subset search by switching the search method to a linear PQ scan if the size of a subset is small. Owing to the linear layout, the data structure can be dynamically adjusted after new items are added, maintaining the fast speed of the system. Extensive comparisons show that Rii achieves a comparable performance with state-of-the art systems such as Faiss.
翻译:现有的近邻搜索系统存在两个基本问题,这些问题具有实际重要性,但没有得到研究界的足够重视。首先,虽然现有系统对整个数据库运作良好,但很难对数据库的一个子集进行搜索。第二,许多项目新加入一个系统后,没有讨论性能下降的问题。我们开发了一个可重新配置的倒置索引(Rii),以解决这两个问题。根据标准的IVFADC系统,我们设计了一个数据布局,将物品储存成线性。这使我们能够通过将搜索方法转换成线性PQ扫描,在子集规模小的情况下,将搜索方法转换成线性PQ扫描,从而高效率地进行子搜索。由于线性布局,数据结构可以在增加新项目后动态调整,保持系统的快速速度。广泛的比较表明,Rii与Faiss等最先进的系统取得了相似的性能。