大型双倍编码器可通用检索 (Large Dual Encoders Are Generalizable Retrievers)

It has been shown that dual encoders trained on one domain often fail to generalize to other domains for retrieval tasks. One widespread belief is that the bottleneck layer of a dual encoder, where the final score is simply a dot-product between a query vector and a passage vector, is too limited to make dual encoders an effective retrieval model for out-of-domain generalization. In this paper, we challenge this belief by scaling up the size of the dual encoder model {\em while keeping the bottleneck embedding size fixed.} With multi-stage training, surprisingly, scaling up the model size brings significant improvement on a variety of retrieval tasks, especially for out-of-domain generalization. Experimental results show that our dual encoders, \textbf{G}eneralizable \textbf{T}5-based dense \textbf{R}etrievers (GTR), outperform %ColBERT~\cite{khattab2020colbert} and existing sparse and dense retrievers on the BEIR dataset~\cite{thakur2021beir} significantly. Most surprisingly, our ablation study finds that GTR is very data efficient, as it only needs 10\% of MS Marco supervised data to achieve the best out-of-domain performance. All the GTR models are released at https://tfhub.dev/google/collections/gtr/1.

翻译：已经显示, 在一个域上受过训练的双重编码器通常无法向其它域推广检索任务。一种普遍的看法是, 双重编码器的瓶颈层, 其最后的分数仅仅是查询矢量和通道矢量之间的点产, 其最后的分数太有限, 无法使双重编码器成为外部概括的有效检索模式。在本文中, 我们通过扩大双编码器模型的大小来挑战这一信念, 同时保持瓶颈嵌入大小的固定。 } 由于多阶段培训, 令人惊讶的是, 扩大模型的大小将大大改进各种检索任务, 特别是外部的概括化。实验结果显示, 我们的双编码器,\ textb{G} G} 无法将双编码器变成基于 extbff{T5 的密度 {rtextff{{{R} etrivers (GTR) 、外型号为 ColBOBERT+%khartblection{khart20colbert} 以及 Benest reackers reacherfleattizestead {th {th {th_thr\\\\\xlationalmaxlations a dalmaxislations.