Graph contrastive learning (GCL) alleviates the heavy reliance on label information for graph representation learning (GRL) via self-supervised learning schemes. The core idea is to learn by maximising mutual information for similar instances, which requires similarity computation between two node instances. However, this operation can be computationally expensive. For example, the time complexity of two commonly adopted contrastive loss functions (i.e., InfoNCE and JSD estimator) for a node is $O(ND)$ and $O(D)$, respectively, where $N$ is the number of nodes, and $D$ is the embedding dimension. Additionally, GCL normally requires a large number of training epochs to be well-trained on large-scale datasets. Inspired by an observation of a technical defect (i.e., inappropriate usage of Sigmoid function) commonly used in two representative GCL works, DGI and MVGRL, we revisit GCL and introduce a new learning paradigm for self-supervised GRL, namely, Group Discrimination (GD), and propose a novel GD-based method called Graph Group Discrimination (GGD). Instead of similarity computation, GGD directly discriminates two groups of summarised node instances with a simple binary cross-entropy loss. As such, GGD only requires $O(1)$ for loss computation of a node. In addition, GGD requires much fewer training epochs to obtain competitive performance compared with GCL methods on large-scale datasets. These two advantages endow GGD with the very efficient property. Extensive experiments show that GGD outperforms state-of-the-art self-supervised methods on 8 datasets. In particular, GGD can be trained in 0.18 seconds (6.44 seconds including data preprocessing) on ogbn-arxiv, which is orders of magnitude (10,000+ faster than GCL baselines} while consuming much less memory. Trained with 9 hours on ogbn-papers100M with billion edges, GGD outperforms its GCL counterparts in both accuracy and efficiency.
翻译:对比图形学习(GCL) 通过自我监督的学习计划,减轻了对图形显示学习的标签信息(GRL)的高度依赖。核心理念是通过为类似情况提供最大程度的相互信息来学习,这需要在两个节点中进行相似的计算。然而,这一操作可以计算出昂贵。例如,两个常用的对比损失功能(即InfoNCE和JSD估计器)对于节点的复杂时间是O(ND)$和$O(D)$,其中,美元是节点数,而$D(D)是嵌入的层面。此外,GCL通常需要大量培训小节点,以便在大型的节点数据设置上接受良好程度的训练。在两个有代表性的GCL(即,不适当使用Sigmoid ) 之前,我们重新使用GCL(NGL) 和 $(D) 在自我监督的GRL,即GGG-Q(GGD) 时间里程中引入新的学习模式,在GGG-D 快速的计算方法中, 需要两个GGD(GGD) 的快速的节点数据流数据流中, 需要两个GGGGGD- daldealdealde) 需要一种小数。