通过多教师知识蒸馏学习 (Automated Graph Self-supervised Learning via Multi-teacher Knowledge Distillation)

Self-supervised learning on graphs has recently achieved remarkable success in graph representation learning. With hundreds of self-supervised pretext tasks proposed over the past few years, the research community has greatly developed, and the key is no longer to design more powerful but complex pretext tasks, but to make more effective use of those already on hand. This paper studies the problem of how to automatically, adaptively, and dynamically learn instance-level self-supervised learning strategies for each node from a given pool of pretext tasks. In this paper, we propose a novel multi-teacher knowledge distillation framework for Automated Graph Self-Supervised Learning (AGSSL), which consists of two main branches: (i) Knowledge Extraction: training multiple teachers with different pretext tasks, so as to extract different levels of knowledge with different inductive biases; (ii) Knowledge Integration: integrating different levels of knowledge and distilling them into the student model. Without simply treating different teachers as equally important, we provide a provable theoretical guideline for how to integrate the knowledge of different teachers, i.e., the integrated teacher probability should be close to the true Bayesian class-probability. To approach the theoretical optimum in practice, two adaptive knowledge integration strategies are proposed to construct a relatively "good" integrated teacher. Extensive experiments on eight datasets show that AGSSL can benefit from multiple pretext tasks, outperforming the corresponding individual tasks; by combining a few simple but classical pretext tasks, the resulting performance is comparable to other leading counterparts.

翻译：在图表上自我监督的学习最近取得了惊人的图形代表学习成功。在过去几年里提出了数百个自我监督的托辞任务,研究界已经取得了巨大的发展,关键已不再是设计更强大、更复杂的托辞任务,而是更有效地利用手头已有的托辞任务。本文研究了如何自动、适应性和动态地学习每个节点的实验一级自我监督的学习战略的问题。在本文中,我们提议了一个创新的多教师知识蒸馏框架,用于自动图表自我监督的学习(AGSSL),它由两个主要分支组成:(一) 知识提取:培训多位教师,设计更有力、更复杂的托辞任务,以便利用不同的感性偏见获取不同程度的知识;(二) 知识整合:将不同水平的知识整合并将其引入学生模式。我们不仅将不同教师视为同等重要,我们还可以提供一个可辨的经典理论指南,用于如何整合不同教师的知识,也就是说,综合教师的概率应该接近于真正的Bayesimical Blaslialalalal-commissional legal commissional le commogradublicalalal le le commogrational commotional le commodigradudududustral le le liversal le le le commograduduction le laisal licommograduction commogrational commotional le commotional le le le commotional commotion le commotional le commotionalital le le le lementalital commogiblitial commodel commogial commotion le le commodel commodel comblibal le le le le le le le le le le combal combal combal combal combal le le le comital le comital comital comital comital comital comital commital commodu。我们可以提出,我们提出一个简单化的理论可以提出出一种简单性地