Many recent approaches towards neural information retrieval mitigate their computational costs by using a multi-stage ranking pipeline. In the first stage, a number of potentially relevant candidates are retrieved using an efficient retrieval model such as BM25. Although BM25 has proven decent performance as a first-stage ranker, it tends to miss relevant passages. In this context we propose CoRT, a simple neural first-stage ranking model that leverages contextual representations from pretrained language models such as BERT to complement term-based ranking functions while causing no significant delay at query time. Using the MS MARCO dataset, we show that CoRT significantly increases the candidate recall by complementing BM25 with missing candidates. Consequently, we find subsequent re-rankers achieve superior results with less candidates. We further demonstrate that passage retrieval using CoRT can be realized with surprisingly low latencies.
翻译:最近许多神经信息检索方法通过使用多级排名管道来降低计算成本。 在第一阶段,一些潜在相关候选人使用诸如BM25等高效的检索模型进行检索。尽管BM25已证明作为第一阶段排名员的表现良好,但往往会错过相关段落。在这方面,我们提议CORT,一个简单的神经第一阶段排名模型,利用诸如BERT等预先培训语言模型的背景陈述来补充基于术语的排名功能,同时在查询时不会造成重大延误。我们利用MS MARCO数据集表明,CORT大大增加了候选人的召回量,以补充BM25的缺失候选人。因此,我们发现随后的重新排名者在候选人较少的情况下取得了优异的结果。我们进一步表明,利用CORT进行通过超低的延迟时间来实现通过检索。