Nowadays, data is represented by vectors. Retrieving those vectors, among millions and billions, that are similar to a given query is a ubiquitous problem of relevance for a wide range of applications. In this work, we present new techniques for creating faster and smaller indices to run these searches. To this end, we introduce a novel vector compression method, Locally-adaptive Vector Quantization (LVQ), that simultaneously reduces memory footprint and improves search performance, with minimal impact on search accuracy. LVQ is designed to work optimally in conjunction with graph-based indices, reducing their effective bandwidth while enabling random-access-friendly fast similarity computations. Our experimental results show that LVQ, combined with key optimizations for graph-based indices in modern datacenter systems, establishes the new state of the art in terms of performance and memory footprint. For billions of vectors, LVQ outcompetes the second-best alternatives: (1) in the low-memory regime, by up to 20.7x in throughput with up to a 3x memory footprint reduction, and (2) in the high-throughput regime by 5.8x with 1.4x less memory.
翻译:现今,数据被向量表示。在已知查询向量的情况下,检索与其相似的向量,面对亿万级别的数据是一个普遍的相关问题。本文提出了一些新的技术来创建更快,更小的索引以运行这些搜索。为此,我们引入了一种新型的向量压缩方法,即局部自适应向量量化(LVQ)。LVQ 同时减少存储空间和改善搜索性能,在最小程度影响搜索精度的同时实现优化。LVQ 被设计成与基于图的索引协同工作,可以减少有效带宽,同时实现支持随机访问的快速相似度计算。实验结果显示, LVQ 结合现代数据中心系统基于图的索引的关键优化,已经在性能和内存占用方面取得了新的最优状态。对于数十亿个向量,LVQ 比第二优秀的方案(1)在低内存情况下,吞吐量高达20.7倍,内存占用减少至3倍;(2)在高吞吐量情况下,吞吐量高达5.8倍,内存占用减少至1.4倍。