Existing knowledge distillation methods on graph neural networks (GNNs) are almost offline, where the student model extracts knowledge from a powerful teacher model to improve its performance. However, a pre-trained teacher model is not always accessible due to training cost, privacy, etc. In this paper, we propose a novel online knowledge distillation framework to resolve this problem. Specifically, each student GNN model learns the extracted local structure from another simultaneously trained counterpart in an alternating training procedure. We further develop a cross-layer distillation strategy by aligning ahead one student layer with the layer in different depth of another student model, which theoretically makes the structure information spread over all layers. Experimental results on five datasets including PPI, Coauthor-CS/Physics and Amazon-Computer/Photo demonstrate that the student performance is consistently boosted in our collaborative training framework without the supervision of a pre-trained teacher model. In addition, we also find that our alignahead technique can accelerate the model convergence speed and its effectiveness can be generally improved by increasing the student numbers in training. Code is available: https://github.com/GuoJY-eatsTG/Alignahead
翻译:平面神经网络(GNNs)的现有知识蒸馏方法几乎离线,学生模型从强大的教师模型中提取知识,以提高其绩效。然而,由于培训成本、隐私等原因,未受过培训的教师模型并非总能获得。我们在本文件中提议了一个新的在线知识蒸馏框架来解决这一问题。具体地说,每个学生GNNS模型在交替培训程序中从另一个同时培训的对应方中学习提取的地方结构。我们进一步制定跨层蒸馏战略,将一个学生层与另一个不同深度的学生模型的层相匹配,从理论上将结构信息传播到所有层次。五个数据集的实验结果,包括PPI、CS/Physics和Amazon-Computer/Photo,表明学生的表现在不经过事先培训的教师模型监督的情况下,在我们的协作培训框架中不断得到提升。此外,我们还发现,我们的校准技术可以加快模型的趋同速度,其效力可以通过增加培训中的学生人数来普遍提高。