Multi-hop retrieval is the task of retrieving a series of multiple documents that together provide sufficient evidence to answer a natural language query. A common practice for text retrieval is to use an encoder to map the documents and the query to a common vector space and perform a nearest neighbor search (NNS); multi-hop retrieval also often adopts the same paradigm, usually with a modification of iteratively reformulating the query vector so that it can retrieve different documents at each hop. However, the inherent limitations of such a bi-encoder approach worsen in the multi-hop setting. As the number of hops increases, the reformulated query increasingly depends on the documents retrieved in its previous hops, which further tightens the embedding bottleneck of the query vector and becomes more prone to error propagation. In this paper, we focus on alleviating these limitations of the bi-encoder approach in multi-hop settings by formulating the problem in a fully generative way. We propose an encoder-decoder model that performs multi-hop retrieval by simply generating the entire text sequences of the retrieval targets, which means the query and the documents interact in the language model's parametric space rather than L2 or inner product space as in the bi-encoder approach. Our approach, Generative Multi-hop Retrieval (GMR), consistently achieves comparable or higher performance than bi-encoder models in five datasets while demonstrating superior GPU memory and storage footprint.
翻译:多窗口检索是重找一系列多个文件的任务,这些文件共同为回答自然语言查询提供了足够证据。文本检索的常见做法是使用编码器绘制文档和对共同矢量空间的查询,并进行近邻搜索(NNS);多窗口检索通常也采用相同的模式,通常是修改对查询矢量的迭代再修改,以便每个端点检索不同的文档。然而,这种双编码方法的内在局限性在多窗口设置中恶化。随着跳数的增加,重置查询越来越依赖于在先前跳点中检索的更高文件,这进一步收紧了查询矢量的嵌入瓶颈,并更容易出现错误传播。在本文件中,我们侧重于减轻多窗口环境中双编码方法的这些局限性,以完全感化的方式提出问题。我们提议了一个解码器-解码器模型,通过简单地生成检索目标的整个文本序列来进行多窗口检索。这意味着在语言模型中进行查询和文件互动,而不是在可比较的 General-R2 中进行双级存储,而不是在语言模型中进行双级存储。