Graph representation learning has attracted lots of attention recently. Existing graph neural networks fed with the complete graph data are not scalable due to limited computation and memory costs. Thus, it remains a great challenge to capture rich information in large-scale graph data. Besides, these methods mainly focus on supervised learning and highly depend on node label information, which is expensive to obtain in the real world. As to unsupervised network embedding approaches, they overemphasize node proximity instead, whose learned representations can hardly be used in downstream application tasks directly. In recent years, emerging self-supervised learning provides a potential solution to address the aforementioned problems. However, existing self-supervised works also operate on the complete graph data and are biased to fit either global or very local (1-hop neighborhood) graph structures in defining the mutual information based loss terms. In this paper, a novel self-supervised representation learning method via Subgraph Contrast, namely \textsc{Subg-Con}, is proposed by utilizing the strong correlation between central nodes and their sampled subgraphs to capture regional structure information. Instead of learning on the complete input graph data, with a novel data augmentation strategy, \textsc{Subg-Con} learns node representations through a contrastive loss defined based on subgraphs sampled from the original graph instead. Compared with existing graph representation learning approaches, \textsc{Subg-Con} has prominent performance advantages in weaker supervision requirements, model learning scalability, and parallelization. Extensive experiments verify both the effectiveness and the efficiency of our work compared with both classic and state-of-the-art graph representation learning approaches on multiple real-world large-scale benchmark datasets from different domains.
翻译:最近吸引了许多关注。 以完整图表数据填充的现有图形神经网络由于计算和记忆成本有限,无法进行缩放。 因此, 在大型图表数据中捕捉丰富的信息仍然是一项巨大的挑战。 此外, 这些方法主要侧重于受监督的学习, 并且高度依赖节点标签信息, 这在现实世界中是昂贵的。 对于未受监督的网络嵌入方法, 它们过分强调节点接近, 而在下游应用程序任务中很难直接使用其学习的近距离。 近年来, 正在形成的自我监督学习为解决上述问题提供了潜在的解决方案。 但是, 现有的自监督工作也以完整的图表数据为主, 并且偏重于确定基于损失条件的相互信息的全球或非常本地的( 1- hop 社区) 图表结构。 在本文中, 一种全新的自我监督的演示方法, 即\ textsc {Sub concormortium}, 是利用中央节点及其抽样子集系之间的紧密关联来获取区域结构信息。 而不是学习完整的投入图表的图表要求, 比较实地分析工具, 和基于原始的缩缩略化的对比战略 。