In various natural language processing tasks, passage retrieval and passage re-ranking are two key procedures in finding and ranking relevant information. Since both the two procedures contribute to the final performance, it is important to jointly optimize them in order to achieve mutual improvement. In this paper, we propose a novel joint training approach for dense passage retrieval and passage re-ranking. A major contribution is that we introduce the dynamic listwise distillation, where we design a unified listwise training approach for both the retriever and the re-ranker. During the dynamic distillation, the retriever and the re-ranker can be adaptively improved according to each other's relevance information. We also propose a hybrid data augmentation strategy to construct diverse training instances for listwise training approach. Extensive experiments show the effectiveness of our approach on both MSMARCO and Natural Questions datasets. Our code is available at https://github.com/PaddlePaddle/RocketQA.
翻译:在各种自然语言处理任务中,通过检索和通过重新排位是查找和排序相关信息的两个关键程序。由于这两种程序都有助于最后的绩效,因此必须共同优化它们,以便实现相互改进。在本文件中,我们提议对密集通道检索和通过重新排位采用新的联合培训方法。一个主要贡献是我们采用动态清单蒸馏方法,为检索者和重新排位者设计统一的清单培训方法。在动态蒸馏过程中,检索器和重新排位器可根据彼此的相关信息进行适应性改进。我们还提议了一项混合数据增强战略,为列表化培训方法构建多种培训实例。广泛的实验表明我们在MSMARCO和自然问题数据集方面的做法的有效性。我们的代码可在https://github.com/PaddlePaddle/RockeQA查阅。