Knowledge distillation is a promising learning paradigm for boosting the performance and reliability of resource-efficient graph neural networks (GNNs) using more expressive yet cumbersome teacher models. Past work on distillation for GNNs proposed the Local Structure Preserving loss (LSP), which matches local structural relationships across the student and teacher's node embedding spaces. In this paper, we make two key contributions: From a methodological perspective, we study whether preserving the global topology of how the teacher embeds graph data can be a more effective distillation objective for GNNs, as real-world graphs often contain latent interactions and noisy edges. The purely local LSP objective over pre-defined edges is unable to achieve this as it ignores relationships among disconnected nodes. We propose two new approaches which better preserve global topology: (1) Global Structure Preserving loss (GSP), which extends LSP to incorporate all pairwise interactions; and (2) Graph Contrastive Representation Distillation (G-CRD), which uses contrastive learning to align the student node embeddings to those of the teacher in a shared representation space. From an experimental perspective, we introduce an expanded set of benchmarks on large-scale real-world datasets where the performance gap between teacher and student GNNs is non-negligible. We believe this is critical for testing the efficacy and robustness of knowledge distillation, but was missing from the LSP study which used synthetic datasets with trivial performance gaps. Experiments across 4 datasets and 14 heterogeneous GNN architectures show that G-CRD consistently boosts the performance and robustness of lightweight GNN models, outperforming the structure preserving approaches, LSP and GSP, as well as baselines adapted from 2D computer vision.
翻译:知识蒸馏是一种有希望的学习模式,用更清晰但繁琐的教师模型来提升资源高效图形神经网络(GNN)的性能和可靠性。过去对GNNS的蒸馏工作提出了本地结构保护损失(LSP),它与学生和教师节点嵌入空间的当地结构关系相匹配。在本文中,我们做出了两个关键贡献:从方法角度看,我们研究是否保留教师嵌入图形数据如何成为GNNS更有效提炼目标的全球地形学,因为真实世界图表往往包含潜伏的互动和噪音边缘。纯本地的LSP推算目标无法实现这一目标,因为它忽略了断开的节点之间的关系。我们提出了两种新的方法,更好地维护全球结构保存损失(GSP),它扩大了LSP的系统,将所有对双对式互动结合起来;以及(2)图表对比代表性蒸馏(G-CRD),它利用对比性调整性学习将学生节流的节能嵌与教师在共享空间的浅度嵌嵌入。从实验的角度,我们引入了全球结构中的最新性GNSP业绩基准,我们用了一个不断扩展的GND数据测试。