Recently, information retrieval has seen the emergence of dense retrievers, using neural networks, as an alternative to classical sparse methods based on term-frequency. These models have obtained state-of-the-art results on datasets and tasks where large training sets are available. However, they do not transfer well to new applications with no training data, and are outperformed by unsupervised term-frequency methods such as BM25. In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings. On the BEIR benchmark our unsupervised model outperforms BM25 on 11 out of 15 datasets for the Recall@100. When used as pre-training before fine-tuning, either on a few thousands in-domain examples or on the large MS~MARCO dataset, our contrastive model leads to improvements on the BEIR benchmark. Finally, we evaluate our approach for multi-lingual retrieval, where training data is even scarcer than for English, and show that our approach leads to strong unsupervised performance. Our model also exhibits strong cross-lingual transfer when fine-tuned on supervised English data only and evaluated on low resources language such as Swahili. We show that our unsupervised models can perform cross-lingual retrieval between different scripts, such as retrieving English documents from Arabic queries, which would not be possible with term matching methods.
翻译:最近,信息检索出现了密集的检索者,利用神经网络,作为基于条件频率的经典稀疏方法的替代方法。这些模型在有大型培训数据集的情况下,在数据集和任务方面获得了最先进的结果。然而,这些模型没有很好地向没有培训数据的新应用程序转移,而且以未受监督的术语频率方法,如BM25等,其表现优于未受监督的远程检索者。在这项工作中,我们探索对比学习的局限性,以此作为培训不受监督的密集检索者的一种方法,并表明它导致各种检索环境中的强效性能。在BEI基准中,我们未经监督的模型在15个数据集中,11个实现了BM25的超常性能。在微调前,它们没有很好地用于培训新应用程序,有数千个内部实例,或者大型MS ~MARCO数据集,我们的对比模型导致BER基准的改进。最后,我们评估我们多语言检索方法的方法,其中培训数据比英语还要少,并且显示我们的方法在不易受监督的阿拉伯文跨度的跨比英语检索中,我们的方法导致强的BM25,在微的跨调的英版文件上,我们也显示我们模型能显示,作为不精细的英版文件。