To improve the performance of the dual-encoder retriever, one effective approach is knowledge distillation from the cross-encoder ranker. Existing works construct the candidate passages following the supervised learning setting where a query is paired with a positive passage and a batch of negatives. However, through empirical observation, we find that even the hard negatives from advanced methods are still too trivial for the teacher to distinguish, preventing the teacher from transferring abundant dark knowledge to the student through its soft label. To alleviate this issue, we propose ADAM, a knowledge distillation framework that can better transfer the dark knowledge held in the teacher with Adaptive Dark exAMples. Different from previous works that only rely on one positive and hard negatives as candidate passages, we create dark examples that all have moderate relevance to the query through mixing-up and masking in discrete space. Furthermore, as the quality of knowledge held in different training instances varies as measured by the teacher's confidence score, we propose a self-paced distillation strategy that adaptively concentrates on a subset of high-quality instances to conduct our dark-example-based knowledge distillation to help the student learn better. We conduct experiments on two widely-used benchmarks and verify the effectiveness of our method.
翻译:为了改进双编码器检索器的性能,一种有效的方法是从跨编码器级阶层进行知识蒸馏,这是一种有效的方法。现有的工程在受监督的学习环境之后建造了候选段落,在其中,一个查询配有正分数和一组负数。然而,通过经验观察,我们发现,即使先进方法的硬反作用仍然太微不足道,教师无法区分,使教师无法通过软标签将大量黑暗知识传递给学生。为了缓解这一问题,我们提议了ADAM,这是一个知识蒸馏框架,它可以更好地传输教师用适应性暗色样板掌握的黑暗知识。与以前的工作不同,以前的工作只依靠一个正面和硬性的负作用作为候选通道,我们创造了一些与查询有中度的黑暗例子。此外,由于不同培训中掌握的知识质量与教师信心分数不同,我们建议了一种自我节奏的蒸馏战略,即适应性地集中在一组高品质的教师中掌握的黑暗知识,以进行我们的黑暗反作用的实验。