Knowledge distillation is a learning paradigm for boosting resource-efficient graph neural networks (GNNs) using more expressive yet cumbersome teacher models. Past work on distillation for GNNs proposed the Local Structure Preserving loss (LSP), which matches local structural relationships defined over edges across the student and teacher's node embeddings. This paper studies whether preserving the global topology of how the teacher embeds graph data can be a more effective distillation objective for GNNs, as real-world graphs often contain latent interactions and noisy edges. We propose Graph Contrastive Representation Distillation (G-CRD), which uses contrastive learning to implicitly preserve global topology by aligning the student node embeddings to those of the teacher in a shared representation space. Additionally, we introduce an expanded set of benchmarks on large-scale real-world datasets where the performance gap between teacher and student GNNs is non-negligible. Experiments across 4 datasets and 14 heterogeneous GNN architectures show that G-CRD consistently boosts the performance and robustness of lightweight GNNs, outperforming LSP (and a global structure preserving variant of LSP) as well as baselines from 2D computer vision. An analysis of the representational similarity among teacher and student embedding spaces reveals that G-CRD balances preserving local and global relationships, while structure preserving approaches are best at preserving one or the other.
翻译:知识蒸馏是一种利用更直观但又繁琐的教师模型提升资源节能图形神经网络的学习范例。过去关于GNNS蒸馏工作曾提出“地方结构保护损失”,与学生和教师节点嵌入空间中超边缘界定的地方结构关系相匹配。本文研究是否保留了教师将图形数据嵌入图形数据如何成为GNS更有效性蒸馏目标的全球地形学,因为真实世界图往往包含潜在互动和杂乱的边缘。我们提议“对比代表性蒸馏(G-CRD)”图表(G-CRD)采用对比学习,通过将学生节点嵌入与教师在共享代表空间中的节点匹配,从而隐含地保存全球地形学。此外,我们引入了一套关于大规模真实世界数据集的扩大基准,因为教师和学生GNNNS之间的性差是不可忽略的。 4个数据集和14个混合的GNNNF结构的实验表明,G-RCD始终在保存性GNNS和学生空间结构中保持最佳平衡和稳健的成绩和稳健健性,同时将GSP的GSP和GSP的模型结构展示了一种全球结构,并展示了全球模型结构的模型结构。