Recommender Systems (RS) have employed knowledge distillation which is a model compression technique training a compact student model with the knowledge transferred from a pre-trained large teacher model. Recent work has shown that transferring knowledge from the teacher's intermediate layer significantly improves the recommendation quality of the student. However, they transfer the knowledge of individual representation point-wise and thus have a limitation in that primary information of RS lies in the relations in the representation space. This paper proposes a new topology distillation approach that guides the student by transferring the topological structure built upon the relations in the teacher space. We first observe that simply making the student learn the whole topological structure is not always effective and even degrades the student's performance. We demonstrate that because the capacity of the student is highly limited compared to that of the teacher, learning the whole topological structure is daunting for the student. To address this issue, we propose a novel method named Hierarchical Topology Distillation (HTD) which distills the topology hierarchically to cope with the large capacity gap. Our extensive experiments on real-world datasets show that the proposed method significantly outperforms the state-of-the-art competitors. We also provide in-depth analyses to ascertain the benefit of distilling the topology for RS.
翻译:顾问系统(RS)采用了知识蒸馏法,这是一种模范压缩技术培训,这是一种紧凑的学生模式,其知识是从受过训练的大型教师模式中传授的。最近的工作表明,从教师中间层传授知识,大大提高了学生的推荐质量。然而,他们传授了个人代表知识的点数,从而限制了塞族共和国的初级信息在代表空间关系中的存在。本文提出了一种新的地形蒸馏法,通过在教师空间关系的基础上转让表层结构来引导学生。我们首先发现,仅仅让学生学习整个表层结构并不总是有效的,甚至降低学生的成绩。我们证明,由于学生的能力与教师相比非常有限,学习整个表层结构对学生来说是艰巨的。为了解决这一问题,我们提出了一种叫作 " 高层次地形蒸馏(HTD) " 的新方法,该方法从层次上提炼出顶层结构,以应对巨大的能力差距。我们在现实世界数据集上进行的广泛实验表明,拟议的方法也大大超越了教师的能力。我们证明,因为学生的能力与教师相比,学习整个表层结构结构结构对于学生来说是艰巨的。我们提出了对州一级竞争者进行深入的分析。