In this paper, we study the problem of conducting self-supervised learning for node representation learning on non-homophilous graphs. Existing self-supervised learning methods typically assume the graph is homophilous where linked nodes often belong to the same class or have similar features. However, such assumptions of homophily do not always hold true in real-world graphs. We address this problem by developing a decoupled self-supervised learning (DSSL) framework for graph neural networks. DSSL imitates a generative process of nodes and links from latent variable modeling of the semantic structure, which decouples different underlying semantics between different neighborhoods into the self-supervised node learning process. Our DSSL framework is agnostic to the encoders and does not need prefabricated augmentations, thus is flexible to different graphs. To effectively optimize the framework with latent variables, we derive the evidence lower-bound of the self-supervised objective and develop a scalable training algorithm with variational inference. We provide a theoretical analysis to justify that DSSL enjoys better downstream performance. Extensive experiments on various types of graph benchmarks demonstrate that our proposed framework can significantly achieve better performance compared with competitive self-supervised learning baselines.
翻译:在本文中,我们研究了在非合用图表上为节点代表学习进行自我监督学习的问题。现有的自监督学习方法通常假定该图是同性图,因为链接节点通常属于同一类别或具有相似的特征。然而,在真实世界的图表中,这种同性假设并不总是真实的。我们通过为图形神经网络开发一个脱钩的自我监督学习(DSSL)框架来解决这一问题。DSSL模仿了一种结点和链接的基因化过程,这种过程来自对语义结构潜在的可变模型,这种结构将不同社区之间的不同基本语义分解到自监督节点学习过程。我们的DSSL框架对编织者来说是不可知性的,因此对不同的图表来说是灵活的。为了以潜在变量有效地优化框架,我们从自我监督的目标中获取了较低的证据,并开发了一种具有差异推论的可缩式培训算法。我们提供了一种理论性分析,以证明DSSL在自我监督的下游基准中具有较好的自我测试性能。我们提出了一种更精确的模型,可以很好地证明DSSL在比较的下级基准。