Recent recommender systems have shown remarkable performance by using an ensemble of heterogeneous models. However, it is exceedingly costly because it requires resources and inference latency proportional to the number of models, which remains the bottleneck for production. Our work aims to transfer the ensemble knowledge of heterogeneous teachers to a lightweight student model using knowledge distillation (KD), to reduce the huge inference costs while retaining high accuracy. Through an empirical study, we find that the efficacy of distillation severely drops when transferring knowledge from heterogeneous teachers. Nevertheless, we show that an important signal to ease the difficulty can be obtained from the teacher's training trajectory. This paper proposes a new KD framework, named HetComp, that guides the student model by transferring easy-to-hard sequences of knowledge generated from the teachers' trajectories. To provide guidance according to the student's learning state, HetComp uses dynamic knowledge construction to provide progressively difficult ranking knowledge and adaptive knowledge transfer to gradually transfer finer-grained ranking information. Our comprehensive experiments show that HetComp significantly improves the distillation quality and the generalization of the student model.
翻译:最近的推荐人系统通过使用多种模式的组合,显示出了显著的绩效。然而,由于它需要资源和与模型数量成比例的推导时间,而模型数量仍然是生产瓶颈,因此成本极高。我们的工作旨在将多样化教师的全方位知识转换为使用知识蒸馏(KD)的轻量级学生模型,以降低巨大的推论成本,同时保持较高的准确性。我们通过经验研究发现,蒸馏的效果在从多样化教师传授知识时会严重下降。然而,我们却表明,从教师的培训轨迹中可以获得一个缓解困难的重要信号。本文提出一个新的KD框架,名为HetComp,通过转移教师轨迹产生的简单到硬的系列知识来指导学生模式。为了根据学生的学习状态提供指导,HetCom利用动态知识构建来提供逐渐困难的排序知识和适应性知识转移,以逐步转移精细的排序信息。我们的全面实验显示,HetCom显著改进了学生的蒸馏质量和普通化模式。</s>