Inspired by the success of contrastive learning (CL) in computer vision and natural language processing, graph contrastive learning (GCL) has been developed to learn discriminative node representations on graph datasets. However, the development of GCL on Heterogeneous Information Networks (HINs) is still in the infant stage. For example, it is unclear how to augment the HINs without substantially altering the underlying semantics, and how to design the contrastive objective to fully capture the rich semantics. Moreover, early investigations demonstrate that CL suffers from sampling bias, whereas conventional debiasing techniques are empirically shown to be inadequate for GCL. How to mitigate the sampling bias for heterogeneous GCL is another important problem. To address the aforementioned challenges, we propose a novel Heterogeneous Graph Contrastive Multi-view Learning (HGCML) model. In particular, we use metapaths as the augmentation to generate multiple subgraphs as multi-views, and propose a contrastive objective to maximize the mutual information between any pairs of metapath-induced views. To alleviate the sampling bias, we further propose a positive sampling strategy to explicitly select positives for each node via jointly considering semantic and structural information preserved on each metapath view. Extensive experiments demonstrate HGCML consistently outperforms state-of-the-art baselines on five real-world benchmark datasets.
翻译:在计算机视觉和自然语言处理方面的对比性学习的成功激励下,已经开发了图表对比性学习(GCL),以学习图形数据集中的歧视性节点表示;然而,关于异种信息网络的GCL(HINs)的开发仍处于初始阶段,例如,尚不清楚如何在不大幅度改变基本语义的情况下增强HINs,以及如何设计充分捕捉丰富语义的对比性目标。此外,早期调查表明,CL受到抽样偏差的影响,而常规偏差技术在经验上证明对GCL来说是不够的。如何减少对异类GCL的抽样偏差是另一个重要问题。为了应对上述挑战,我们提议采用新的异种图对立多视角学习模式。特别是,我们使用元路径作为放大法来生成多种子谱,并提议一个对比性目标,以最大限度地扩大任何对应的代谢性观点之间的相互信息。为了减轻取样偏差,我们进一步提议通过每类正式的模型模型模型模型模型,共同展示一个积极的模型模型模型。