Knowledge distillation has recently become a popular technique to improve the model generalization ability on convolutional neural networks. However, its effect on graph neural networks is less than satisfactory since the graph topology and node attributes are likely to change in a dynamic way and in this case a static teacher model is insufficient in guiding student training. In this paper, we tackle this challenge by simultaneously training a group of graph neural networks in an online distillation fashion, where the group knowledge plays a role as a dynamic virtual teacher and the structure changes in graph neural networks are effectively captured. To improve the distillation performance, two types of knowledge are transferred among the students to enhance each other: local knowledge reflecting information in the graph topology and node attributes, and global knowledge reflecting the prediction over classes. We transfer the global knowledge with KL-divergence as the vanilla knowledge distillation does, while exploiting the complicated structure of the local knowledge with an efficient adversarial cyclic learning framework. Extensive experiments verified the effectiveness of our proposed online adversarial distillation approach.
翻译:近来,知识蒸馏已成为一种普及技术,可以提高进化神经网络的模型普及能力,但是,它对图形神经网络的影响不尽人意,因为图形表层和节点属性有可能发生动态变化,在这种情况下,静态教师模式不足以指导学生培训。在本文中,我们通过同时以在线蒸馏方式培训一组图形神经网络,同时以在线蒸馏方式培训一组图形神经网络,使小组知识发挥动态虚拟教师的作用,并有效捕捉了图形神经网络的结构变化。为了提高蒸馏性,学生之间转让了两种类型的知识,以相互增强对方:反映图表表层和节点属性信息的地方知识,以及反映跨班预测的全球知识。我们通过香草知识蒸馏将全球知识与KL-Divergence转让,同时利用高效的对抗性循环学习框架来利用当地知识的复杂结构。广泛的实验证实了我们提议的在线对抗性蒸馏方法的有效性。