We propose a simple and effective re-ranking method for improving passage retrieval in open question answering. The re-ranker re-scores retrieved passages with a zero-shot question generation model, which uses a pre-trained language model to compute the probability of the input question conditioned on a retrieved passage. This approach can be applied on top of any retrieval method (e.g. neural or keyword-based), does not require any domain- or task-specific training (and therefore is expected to generalize better to data distribution shifts), and provides rich cross-attention between query and passage (i.e. it must explain every token in the question). When evaluated on a number of open-domain retrieval datasets, our re-ranker improves strong unsupervised retrieval models by 6%-18% absolute and strong supervised models by up to 12% in terms of top-20 passage retrieval accuracy. We also obtain new state-of-the-art results on full open-domain question answering by simply adding the new re-ranker to existing models with no further changes.
翻译:我们建议了一种简单而有效的重新排序方法来改进开放式答题中的通道检索。 重新排序的重新排序重新排序者检索了带有零光问题生成模型的段落,该模型使用预先培训的语言模型来计算以检索通道为条件的输入问题的概率。 这种方法可以在任何检索方法(例如神经或关键词)之外(例如基于神经或关键词的)应用,并不需要任何具体领域或任务的培训(因此,预计它会更好地概括到数据分布的变化),并且提供了查询和通道之间丰富的交叉注意(即它必须解释问题中的每个符号 ) 。 在对一些开放域检索数据集进行评估时,我们的重新排序者将强大的不受监督的检索模型提高了6%-18%的绝对值和强大的监督模型,在前20个通道检索精确度方面达到12%。 我们还在完全开放的问题上获得了新的最新结果,只需简单地将新的重新排序器添加到现有的模型中,而没有进一步的变化。