Recent self-supervised pre-training methods on Heterogeneous Information Networks (HINs) have shown promising competitiveness over traditional semi-supervised Heterogeneous Graph Neural Networks (HGNNs). Unfortunately, their performance heavily depends on careful customization of various strategies for generating high-quality positive examples and negative examples, which notably limits their flexibility and generalization ability. In this work, we present SHGP, a novel Self-supervised Heterogeneous Graph Pre-training approach, which does not need to generate any positive examples or negative examples. It consists of two modules that share the same attention-aggregation scheme. In each iteration, the Att-LPA module produces pseudo-labels through structural clustering, which serve as the self-supervision signals to guide the Att-HGNN module to learn object embeddings and attention coefficients. The two modules can effectively utilize and enhance each other, promoting the model to learn discriminative embeddings. Extensive experiments on four real-world datasets demonstrate the superior effectiveness of SHGP against state-of-the-art unsupervised baselines and even semi-supervised baselines. We release our source code at: https://github.com/kepsail/SHGP.
翻译:最近在异构信息网络(HINs)上的自我监督预训练方法已经展示出较传统的半监督异构图神经网络(HGNNs)更有竞争力的表现。不幸的是,它们的性能严重依赖于各种策略的仔细定制,以生成高质量的正面例子和负面例子,这明显限制了它们的灵活性和泛化能力。在这项工作中,我们提出了一种新的自我监督异构图预训练方法SHGP,它不需要生成任何正面例子或负面例子。它由两个模块组成,两个模块共享相同的注意力汇聚方案。在每次迭代中,Att-LPA模块通过结构聚类产生伪标签,伪标签作为自我监督信号指导Att-HGNN模块学习对象嵌入和注意力系数。两个模块可以有效地利用和增强对方,促进模型学习判别嵌入。在四个真实数据集上的广泛实验表明,SHGP对比最先进的无监督基线甚至半监督基线都具有显著的优越性。我们在https://github.com/kepsail/SHGP上发布了源代码。