Recently, information retrieval has seen the emergence of dense retrievers, based on neural networks, as an alternative to classical sparse methods based on term-frequency. These models have obtained state-of-the-art results on datasets and tasks where large training sets are available. However, they do not transfer well to new applications with no training data, and are outperformed by unsupervised term-frequency methods such as BM25. In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings. On the BEIR benchmark our unsupervised model outperforms BM25 on 11 out of 15 datasets for the Recall@100 metric. When used as pre-training before fine-tuning, either on a few thousands in-domain examples or on the large MS MARCO dataset, our contrastive model leads to improvements on the BEIR benchmark. Finally, we evaluate our approach for multi-lingual retrieval, where training data is even scarcer than for English, and show that our approach leads to strong unsupervised performance. Our model also exhibits strong cross-lingual transfer when fine-tuned on supervised English data only and evaluated on low resources language such as Swahili. We show that our unsupervised models can perform cross-lingual retrieval between different scripts, such as retrieving English documents from Arabic queries, which would not be possible with term matching methods.
翻译:最近,信息检索发现,在神经网络的基础上出现了密密的检索器,以神经网络为基础,作为基于使用频率的经典稀疏方法的替代方法。这些模型在有大型培训数据集的情况下,在数据集和任务方面获得了最先进的结果。然而,这些模型没有很好地向没有培训数据的新应用程序转移,而且以未受监督的术语频率方法,如BM25等,其表现优于未受监督的术语检索器。在这项工作中,我们探索对比学习的局限性,以此作为培训不受监督的密集检索器的一种方法,并表明它导致各种检索环境中的强效业绩。在BEIR基准中,这些模型测量了我们未经监督的模型在15个数据集中,11个模型是BMB25。在进行微调前的训练时,没有很好地将BMBMARCO数据集(例如BM25),我们的对比模型导致BER基准的改进。最后,我们评估我们多语言检索方法的方法,即培训数据比英语更稀少,并且显示我们的方法是强的、强的BBM25,在检索过程中,我们的方法是强的不精准的、不精细的英文本检索。