A common practice for text retrieval is to use an encoder to map the documents and the query to a common vector space and perform a nearest neighbor search (NNS); multi-hop retrieval also often adopts the same paradigm, usually with a modification of iteratively reformulating the query vector so that it can retrieve different documents at each hop. However, such a bi-encoder approach has limitations in multi-hop settings; (1) the reformulated query gets longer as the number of hops increases, which further tightens the embedding bottleneck of the query vector, and (2) it is prone to error propagation. In this paper, we focus on alleviating these limitations in multi-hop settings by formulating the problem in a fully generative way. We propose an encoder-decoder model that performs multi-hop retrieval by simply generating the entire text sequences of the retrieval targets, which means the query and the documents interact in the language model's parametric space rather than L2 or inner product space as in the bi-encoder approach. Our approach, Generative Multi-hop Retrieval(GMR), consistently achieves comparable or higher performance than bi-encoder models in five datasets while demonstrating superior GPU memory and storage footprint.
翻译:文本检索的常见做法是使用编码器绘制文档和对普通矢量空间的查询,并进行近邻搜索(NNS);多窗口检索也往往采用相同的模式,通常对查询矢量进行迭代重整,以便每个跳点检索不同的文档。不过,这种双编码方法在多窗口设置方面有局限性;(1) 重订查询随着跳跃次数的增加而延长,这进一步收紧了对查询矢量的嵌入瓶颈,以及(2) 它容易发生错误传播。在本文中,我们侧重于通过以完全基因化的方式提出问题来减轻多窗口环境中的这些限制。我们建议了一个编码计算器-解码器模型,通过简单的生成检索目标的整个文本序列来进行多窗口检索,这意味着查询和文件在语言模型的参数空间而不是像双编码方法那样在L2或内产空间中相互作用。我们的方法,Genementalment 多窗口Retraival(GMMR),在5个数据存储模型中始终实现可比或更高的性或更高性,同时展示双轨道存储模型。