Existing graph contrastive learning (GCL) techniques typically require two forward passes for a single instance to construct the contrastive loss, which is effective for capturing the low-frequency signals of node features. Such a dual-pass design has shown empirical success on homophilic graphs, but its effectiveness on heterophilic graphs, where directly connected nodes typically have different labels, is unknown. In addition, existing GCL approaches fail to provide strong performance guarantees. Coupled with the unpredictability of GCL approaches on heterophilic graphs, their applicability in real-world contexts is limited. Then, a natural question arises: Can we design a GCL method that works for both homophilic and heterophilic graphs with a performance guarantee? To answer this question, we theoretically study the concentration property of features obtained by neighborhood aggregation on homophilic and heterophilic graphs, introduce the single-pass graph contrastive learning loss based on the property, and provide performance guarantees for the minimizer of the loss on downstream tasks. As a direct consequence of our analysis, we implement the Single-Pass Graph Contrastive Learning method (SP-GCL). Empirically, on 14 benchmark datasets with varying degrees of homophily, the features learned by the SP-GCL can match or outperform existing strong baselines with significantly less computational overhead, which demonstrates the usefulness of our findings in real-world cases.
翻译:现有的图形对比学习(GCL)技术通常需要两次前向传递来构建对比损失,这对于捕捉节点特征的低频信号是有效的。这样的双通设计在同质图上表现出了实证成功,但在异质图上的有效性是未知的,因为直接连接的节点通常具有不同的标签。 此外,现有的GCL方法无法提供强大的性能保证。加上GCL方法在异质图上的不可预测性,它们在实际环境中的适用性受到限制。因此,一个自然的问题就出现了:我们能否设计一种适用于同质和异质图的GCL方法,并提供性能保证?为了回答这个问题,我们从理论上研究了在同质图和异质图上通过邻居聚合得到的特征的集中性质,引入了基于这种属性的单遍图形对比学习损失,并为该损失的最小化器提供性能保证。作为我们分析的直接结果,我们实现了单遍图形对比学习方法(SP-GCL)。在具有不同程度同质性的14个基准数据集上的实验表明,SP-GCL学习的特征可以与现有的强基线匹配或超越,而且计算开销显著较小,这证明了我们的研究成果在实际环境中的实用性。