Current dense text retrieval models face two typical challenges. First, it adopts a siamese dual-encoder architecture to encode query and document independently for fast indexing and searching, whereas neglecting the finer-grained term-wise interactions. This results in a sub-optimal recall performance. Second, it highly relies on a negative sampling technique to build up the negative documents in its contrastive loss. To address these challenges, we present Adversarial Retriever-Ranker (AR2), which consists of a dual-encoder retriever plus a cross-encoder ranker. The two models are jointly optimized according to a minimax adversarial objective: the retriever learns to retrieve negative documents to cheat the ranker, while the ranker learns to rank a collection of candidates including both the ground-truth and the retrieved ones, as well as providing progressive direct feedback to the dual-encoder retriever. Through this adversarial game, the retriever gradually produces harder negative documents to train a better ranker, whereas the cross-encoder ranker provides progressive feedback to improve retriever. We evaluate AR2 on three benchmarks. Experimental results show that AR2 consistently and significantly outperforms existing dense retriever methods and achieves new state-of-the-art results on all of them. This includes the improvements on Natural Questions R@5 to 77.9%(+2.1%), TriviaQA R@5 to 78.2%(+1.4), and MS-MARCO MRR@10 to 39.5%(+1.3%). We will make our code, models, and data publicly available.
翻译:当前密集的文本检索模型面临两个典型的挑战。 首先,它采用了一个双编码结构,为快速索引和搜索独立编码查询和文档,而忽略了细微的比重术语互动。 这导致一个亚优的回忆性表现。 其次, 它高度依赖一种负面抽样技术来构建其对比性损失中的负文档。 为了应对这些挑战, 我们展示了 Aversarial Retriever- Ranker( AR2), 其中包括一个双编码检索器和一个交叉编码列列列。 两个模型根据一个小型数学对抗性目标共同优化了查询和文档的编码: 检索器学会检索负文档以欺骗排名器, 而排序者则学习了包括地面图和回收器的反差值在内的候选人的排名。 为了应对这些挑战, 检索器将逐渐生成更难的文件, 而交叉编码组则提供不断进步的 R2+R5+R2 的反馈, 以及不断更新的AR2 Q 。 我们评估的是, 不断更新的 R2 和 R2 最新的 RA 方法, 。