Recommender systems (RS) have started to employ knowledge distillation, which is a model compression technique training a compact model (student) with the knowledge transferred from a cumbersome model (teacher). The state-of-the-art methods rely on unidirectional distillation transferring the knowledge only from the teacher to the student, with an underlying assumption that the teacher is always superior to the student. However, we demonstrate that the student performs better than the teacher on a significant proportion of the test set, especially for RS. Based on this observation, we propose Bidirectional Distillation (BD) framework whereby both the teacher and the student collaboratively improve with each other. Specifically, each model is trained with the distillation loss that makes to follow the other's prediction along with its original loss function. For effective bidirectional distillation, we propose rank discrepancy-aware sampling scheme to distill only the informative knowledge that can fully enhance each other. The proposed scheme is designed to effectively cope with a large performance gap between the teacher and the student. Trained in the bidirectional way, it turns out that both the teacher and the student are significantly improved compared to when being trained separately. Our extensive experiments on real-world datasets show that our proposed framework consistently outperforms the state-of-the-art competitors. We also provide analyses for an in-depth understanding of BD and ablation studies to verify the effectiveness of each proposed component.
翻译:建议系统(RS)已开始采用知识蒸馏法,这是一种模范压缩技术培训,这是一种从一个繁琐的模式(教师)传授知识的紧凑模型(学生),最先进的方法依靠单向蒸馏法,仅将知识从教师转让给学生,其基本假设是教师总是优于学生。然而,我们证明,学生在很大一部分测试中的表现优于教师,特别是对RS而言。根据这一观察,我们提议双向蒸馏(BD)框架,教师和学生可以互相协作改进。具体地说,每个最先进的方法都依靠单向蒸馏法,这种技术只能将知识从教师转移到学生,而这种单向的蒸馏法则与最初的损失功能一起,因此,我们建议定级偏差-觉采样办法,只提炼能够充分增强彼此能力的信息知识。根据这一观察,我们提议的计划旨在有效地应对教师和学生之间巨大的业绩差距。在双向方式上培训,每个模型都显示教师和学生之间对结果的深度损失。我们每个部分都经过了不同的实验,我们每个实验的深度分析也显示我们的真正和学生的形态。