In recent years, using a self-supervised learning framework to learn the general characteristics of graphs has been considered a promising paradigm for graph representation learning. The core of self-supervised learning strategies for graph neural networks lies in constructing suitable positive sample selection strategies. However, existing GNNs typically aggregate information from neighboring nodes to update node representations, leading to an over-reliance on neighboring positive samples, i.e., homophilous samples; while ignoring long-range positive samples, i.e., positive samples that are far apart on the graph but structurally equivalent samples, a problem we call "neighbor bias." This neighbor bias can reduce the generalization performance of GNNs. In this paper, we argue that the generalization properties of GNNs should be determined by combining homogeneous samples and structurally equivalent samples, which we call the "GC combination hypothesis." Therefore, we propose a topological signal-driven self-supervised method. It uses a topological information-guided structural equivalence sampling strategy. First, we extract multiscale topological features using persistent homology. Then we compute the structural equivalence of node pairs based on their topological features. In particular, we design a topological loss function to pull in non-neighboring node pairs with high structural equivalence in the representation space to alleviate neighbor bias. Finally, we use the joint training mechanism to adjust the effect of structural equivalence on the model to fit datasets with different characteristics. We conducted experiments on the node classification task across seven graph datasets. The results show that the model performance can be effectively improved using a strategy of topological signal enhancement.
翻译:近些年来,使用自监督学习框架来学习图形的一般特征,这被认为是图表代表性学习的一个很有希望的范例。图形神经网络自监督学习战略的核心在于构建合适的正样选择战略。然而,现有的GNNS通常会通过将相邻节点的汇总信息来更新节点表示方式,导致过度依赖相邻正样,即同性样本;同时忽略了远程正样,即在图形中相距甚远但结构等同的正样样,这是一个我们称之为“邻居偏差”的问题。这一邻居偏差可以降低GNNS的通用性能。在本文中,我们认为,GNNNS的通用性能应该通过将同质样本和结构等同样本相结合来确定,我们称之为“GC组合假设”。因此,我们建议了一种由表层信号驱动的自超强方法。它使用模型信息引导结构对等值抽样战略。首先,我们用持续同性同性分析来提取多层次的表象特征。然后,我们用不甚相等值结构对等值结构结构的等值对等值进行结构分析,然后我们用其结构结构结构结构结构变等值结构结构结构分析结构分析结构结构结构结构结构变等值的等值的等值结构分析,最后显示我们用其结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构的等值的等值分析功能,我们用在上不进行高的对等同性功能,在上对等同性结构结构结构结构结构结构结构结构结构结构结构结构结构结构上显示我们进行结构结构结构结构结构结构结构结构结构结构结构结构结构上进行高的对等性功能,在上对等性功能上对等性功能上对等性,在上进行结构上对等值结构上对等性,在上对等特性上对等特性上对等特性上对等功能上对等功能上对等特性上对等特性上对等特性上对等特性上对等性学上进行对等性学上对等性学上对等性学上对等。在上对等性学上对等性对等性学上对等性学上对等性学上对等同性学上对等性学上对等性学上对等性学上对等性学