Neural information retrieval (IR) has greatly advanced search and other knowledge-intensive language tasks. While many neural IR methods encode queries and documents into single-vector representations, late interaction models produce multi-vector representations at the granularity of each token and decompose relevance modeling into scalable token-level computations. This decomposition has been shown to make late interaction more effective, but it inflates the space footprint of these models by an order of magnitude. In this work, we introduce ColBERTv2, a retriever that couples an aggressive residual compression mechanism with a denoised supervision strategy to simultaneously improve the quality and space footprint of late interaction. We evaluate ColBERTv2 across a wide range of benchmarks, establishing state-of-the-art quality within and outside the training domain while reducing the space footprint of late interaction models by 6--10$\times$.
翻译:神经信息检索(IR)已大大推进了搜索和其他知识密集型语言任务。虽然许多神经IR方法将查询和文件编码成单一矢量代表器,但延迟互动模式在每种象征的颗粒上产生多矢量的表示,并将相关模型分解成可缩放的象征性计算。这种分解已经表明使晚间互动更加有效,但以数量顺序使这些模型的空间足迹膨胀。在这项工作中,我们引入了ColBERTv2号,这是一个回收器,将一个充满攻击性的残余压缩机制与一种分化的监管战略结合起来,以同时提高晚间互动的质量和空间足迹。我们评估ColBERTv2号跨越了广泛的基准范围,在培训领域内外建立了最新质量,同时将晚间互动模型的空间足迹减少了6-10美元。