Knowledge Distillation (KD) aims at transferring knowledge from a larger well-optimized teacher network to a smaller learnable student network.Existing KD methods have mainly considered two types of knowledge, namely the individual knowledge and the relational knowledge. However, these two types of knowledge are usually modeled independently while the inherent correlations between them are largely ignored. It is critical for sufficient student network learning to integrate both individual knowledge and relational knowledge while reserving their inherent correlation. In this paper, we propose to distill the novel holistic knowledge based on an attributed graph constructed among instances. The holistic knowledge is represented as a unified graph-based embedding by aggregating individual knowledge from relational neighborhood samples with graph neural networks, the student network is learned by distilling the holistic knowledge in a contrastive manner. Extensive experiments and ablation studies are conducted on benchmark datasets, the results demonstrate the effectiveness of the proposed method. The code has been published in https://github.com/wyc-ruiker/HKD
翻译:知识蒸馏(KD)旨在将知识从更大型、更完善的教师网络向更小的可学习学生网络转移。 现有的KD方法主要考虑了两种类型的知识,即个人知识和关系知识。然而,这两种知识通常是独立建模的,而它们之间的内在关联则大都被忽视。对于足够的学生网络学习将个人知识和关系知识结合起来,同时保留其内在关联性至关重要。在本文中,我们提议根据各种实例所构造的推算图,将新颖的整体知识蒸馏为一种基于图表的综合知识。整体知识体现为一种统一的基于图表的嵌入,通过将关系区样本的个人知识与图形神经网络相结合,学生网络通过以对比的方式提炼整体知识学习。对基准数据集进行了广泛的实验和膨胀研究,结果证明了拟议方法的有效性。该代码已在https://github.com/wyc-ruiker/HKD中公布。