Generative self-supervised learning (SSL), especially masked autoencoders, has become one of the most exciting learning paradigms and has shown great potential in handling graph data. However, real-world graphs are always heterogeneous, which poses three critical challenges that existing methods ignore: 1) how to capture complex graph structure? 2) how to incorporate various node attributes? and 3) how to encode different node positions? In light of this, we study the problem of generative SSL on heterogeneous graphs and propose HGMAE, a novel heterogeneous graph masked autoencoder model to address these challenges. HGMAE captures comprehensive graph information via two innovative masking techniques and three unique training strategies. In particular, we first develop metapath masking and adaptive attribute masking with dynamic mask rate to enable effective and stable learning on heterogeneous graphs. We then design several training strategies including metapath-based edge reconstruction to adopt complex structural information, target attribute restoration to incorporate various node attributes, and positional feature prediction to encode node positional information. Extensive experiments demonstrate that HGMAE outperforms both contrastive and generative state-of-the-art baselines on several tasks across multiple datasets.
翻译:自我监督的自导学习(SSL),特别是蒙面自动编码器,已成为最令人兴奋的学习范例之一,在处理图表数据方面显示出巨大的潜力。然而,真实世界的图表总是多种多样的,这提出了现有方法忽视的三个关键挑战:(1)如何捕捉复杂的图形结构?(2)如何纳入各种节点属性?和(3)如何将不同的节点位置编码?鉴于这一点,我们研究了多元图形上的基因 SL的问题,并提议了一种新型的混合图形掩码自动编码模型,以应对这些挑战。HGMAE通过两种创新的掩码技术和三种独特的培训战略收集全面的图表信息。特别是,我们首先开发了具有动态掩码率的元式掩码和适应性属性掩码,以便能够有效和稳定地学习多元图形。我们随后设计了几项培训战略,包括基于基因的边缘重建,以采用复杂的结构信息,目标将恢复归给各种节点属性,以及定位特征预测,以编码节点定位信息。广泛的实验表明,HGMAE在多个数据基准上超越了对比性和基因化的多个基准。