We study the problem of \emph{vector set search} with \emph{vector set queries}. This task is analogous to traditional near-neighbor search, with the exception that both the query and each element in the collection are \textit{sets} of vectors. We identify this problem as a core subroutine for many web applications and find that existing solutions are unacceptably slow. Towards this end, we present a new approximate search algorithm, DESSERT ({\bf D}ESSERT {\bf E}ffeciently {\bf S}earches {\bf S}ets of {\bf E}mbeddings via {\bf R}etrieval {\bf T}ables). DESSERT is a general tool with strong theoretical guarantees and excellent empirical performance. When we integrate DESSERT into ColBERT, a highly optimized state-of-the-art semantic search method, we find a 2-5x speedup on the MSMarco passage ranking task with minimal loss in recall, underscoring the effectiveness and practical applicability of our proposal.
翻译:我们用 emph{ vector set search} 来研究 emph{ vctor set set search} 的问题。 这项任务类似于传统的近邻搜索, 唯一的例外是, 收藏中的查询和每个元素都是矢量的 \ textit{ sets} 。 我们将此问题确定为许多网络应用程序的核心子常规, 发现现有解决方案是令人无法接受的。 为此, 我们提出了一种新的近似搜索算法, DESSERT ( SSERT ) ( DESSTET ) 。 在MSMarco 的排名中, 我们发现有2-5x的加速, 微小的回想损失, 强调了我们提案的有效性和实用性。 当我们将 DESSTERT 整合到ColBERT 中时, 这是一种高度优化的状态语系搜索方法, 我们发现MSMarco 的分路段排列任务速度为2-5x, 提醒人们注意, 强调了我们的建议的有效性和实用性。