Deep pre-trained language models (e,g. BERT) are effective at large-scale text retrieval task. Existing text retrieval systems with state-of-the-art performance usually adopt a retrieve-then-reranking architecture due to the high computational cost of pre-trained language models and the large corpus size. Under such a multi-stage architecture, previous studies mainly focused on optimizing single stage of the framework thus improving the overall retrieval performance. However, how to directly couple multi-stage features for optimization has not been well studied. In this paper, we design Hybrid List Aware Transformer Reranking (HLATR) as a subsequent reranking module to incorporate both retrieval and reranking stage features. HLATR is lightweight and can be easily parallelized with existing text retrieval systems so that the reranking process can be performed in a single yet efficient processing. Empirical experiments on two large-scale text retrieval datasets show that HLATR can efficiently improve the ranking performance of existing multi-stage text retrieval methods.
翻译:经过深层次培训的语文模型(例如,BERT)在大规模文本检索任务中是有效的。具有最新性能的现有文本检索系统通常采用检索-升级结构,因为预先培训的语言模型的计算成本高,而且体积大。在这样的多阶段结构下,以往的研究主要侧重于优化框架的单一阶段,从而改善总体检索性能。然而,如何直接将多个阶段的功能组合在一起优化的问题没有得到很好研究。在本文中,我们设计了混合清单《了解变异器》作为随后的重新排级模块,以纳入检索和重新排级功能。HLATR是轻量级的,可以很容易地与现有的文本检索系统平行,以便重新排位进程能够在单一但有效的处理中进行。关于两个大规模文本检索数据集的经验性实验显示,HLATR能够有效地改进现有多阶段文本检索方法的排位性能。