Pre-trained language models have become a crucial part of ranking systems and achieved very impressive effects recently. To maintain high performance while keeping efficient computations, knowledge distillation is widely used. In this paper, we focus on two key questions in knowledge distillation for ranking models: 1) how to ensemble knowledge from multi-teacher; 2) how to utilize the label information of data in the distillation process. We propose a unified algorithm called Pairwise Iterative Logits Ensemble (PILE) to tackle these two questions simultaneously. PILE ensembles multi-teacher logits supervised by label information in an iterative way and achieved competitive performance in both offline and online experiments. The proposed method has been deployed in a real-world commercial search system.
翻译:预先培训的语言模型已成为排名系统的一个关键部分,最近取得了令人印象深刻的效果。为了在保持高效计算的同时保持高性能,知识蒸馏被广泛使用。在本文中,我们侧重于排名模型知识蒸馏的两个关键问题:(1) 如何将多教师的知识汇集在一起;(2) 如何在蒸馏过程中利用数据标签信息。我们提出了一个统一算法,名为“Pairwith 迭代逻辑”(PILE),以同时解决这两个问题。 PILE 组装多教师日志,以迭接方式对标签信息进行监管,并在离线和在线实验中实现竞争性业绩。拟议方法已被应用到一个真实世界的商业搜索系统中。