Text retrieval is often formulated as mapping the query and the target items (e.g., passages) to the same vector space and finding the item whose embedding is closest to that of the query. In this paper, we explore a generative approach as an alternative, where we use an encoder-decoder model to memorize the target corpus in a generative manner and then finetune it on query-to-passage generation. As GENRE(Cao et al., 2021) has shown that entities can be retrieved in a generative way, our work can be considered as its generalization to longer text. We show that it consistently achieves comparable performance to traditional bi-encoder retrieval on diverse datasets and is especially strong at retrieving highly structured items, such as reasoning chains and graph relations, while demonstrating superior GPU memory and time complexity. We also conjecture that generative retrieval is complementary to traditional retrieval, as we find that an ensemble of both outperforms homogeneous ensembles.
翻译:文本检索通常被设计成向同一矢量空间绘制查询和目标项目(例如通道),并找到与查询空间最接近的物品。在本文中,我们探索一种基因化方法,作为一种替代方法,我们使用编码器解码器模型,以基因化的方式对目标物进行记忆,然后将其微调到查询到通道的生成中。GENRE(Cao等人,2021年)已经表明,实体可以基因化的方式检索,我们的工作可以被视为其一般化到较长的文本。我们表明,它一贯地取得与不同数据集的传统双编码检索相似的性能,在检索结构严密的项目,例如推理链和图形关系时,特别强,同时展示高端的 GPU 内存和时间复杂性。我们还推测,基因化检索是对传统检索的补充,因为我们发现,两种外形同质聚合物的共合体具有相似性。