Recently developed graph contrastive learning (GCL) approaches compare two different "views" of the same graph in order to learn node/graph representations. The core assumption of these approaches is that by graph augmentation, it is possible to generate several structurally different but semantically similar graph structures, and therefore, the identity labels of the original and augmented graph/nodes should be identical. However, in this paper, we observe that this assumption does not always hold, for example, any perturbation to nodes or edges in a molecular graph will change the graph labels to some degree. Therefore, we believe that augmenting the graph structure should be accompanied by an adaptation of the labels used for the contrastive loss. Based on this idea, we propose ID-MixGCL, which allows for simultaneous modulation of both the input graph and the corresponding identity labels, with a controllable degree of change, leading to the capture of fine-grained representations from unlabeled graphs. Experimental results demonstrate that ID-MixGCL improves performance on graph classification and node classification tasks, as demonstrated by significant improvements on the Cora, IMDB-B, and IMDB-M datasets compared to state-of-the-art techniques, by 3-29% absolute points.
翻译:最近开发的Graph对比学习(GCL)方法将同一图的两个不同“视图”进行比较,以学习节点/图表示。这些方法的核心假设是,通过图形增强,可以生成几个结构上不同但语义相似的图形结构,因此,原始和增强图/节点的身份标签应该是相同的。然而,在本文中,我们观察到这个假设并不总是成立,例如,在分子图中对节点或边进行的任何扰动都会在一定程度上改变图形标签。因此,我们认为增强图形结构应该伴随着对用于对比损失的标签的调整。基于这个想法,我们提出了ID-MixGCL,它允许同时调节输入图形和相应的身份标签,控制变化的程度,从而从未标记的图形中捕捉到细粒度的表示。实验结果表明,与最先进的技术相比,ID-MixGCL在图分类和节点分类任务上提高了性能,在Cora、IMDB-B和IMDB-M数据集上分别提高了3-29%。