Neural retrievers based on pre-trained language models (PLMs), such as dual-encoders, have achieved promising performance on the task of open-domain question answering (QA). Their effectiveness can further reach new state-of-the-arts by incorporating cross-architecture knowledge distillation. However, most of the existing studies just directly apply conventional distillation methods. They fail to consider the particular situation where the teacher and student have different structures. In this paper, we propose a novel distillation method that significantly advances cross-architecture distillation for dual-encoders. Our method 1) introduces a self on-the-fly distillation method that can effectively distill late interaction (i.e., ColBERT) to vanilla dual-encoder, and 2) incorporates a cascade distillation process to further improve the performance with a cross-encoder teacher. Extensive experiments are conducted to validate that our proposed solution outperforms strong baselines and establish a new state-of-the-art on open-domain QA benchmarks.
翻译:基于预先培训的语言模型(PLMs)的神经检索器,如双电解码器,在开放式问题解答(QA)的任务上取得了有希望的成绩。它们的效力可以通过纳入跨建筑知识蒸馏而进一步达到新的状态。然而,大多数现有研究只是直接应用常规蒸馏方法。它们没有考虑到教师和学生有不同结构的具体情况。在本文件中,我们提出了一种新颖的蒸馏方法,大大推进了双电解码器的跨建筑蒸馏。我们的方法1(我们的方法1)引入了一种能够有效地蒸馏晚期互动的自动即时蒸馏方法(即ColBERT)到香草双电镀器,以及2)结合了一种级蒸馏过程,以进一步改善与跨构码师教师的绩效。进行了广泛的试验,以证实我们提议的解决方案超越了强大的基线,并建立了新的开放式QA基准状态。