Graph machine learning (GML) has made great progress in node classification, link prediction, graph classification and so on. However, graphs in reality are often structurally imbalanced, that is, only a few hub nodes have a denser local structure and higher influence. The imbalance may compromise the robustness of existing GML models, especially in learning tail nodes. This paper proposes a selective graph augmentation method (SAug) to solve this problem. Firstly, a Pagerank-based sampling strategy is designed to identify hub nodes and tail nodes in the graph. Secondly, a selective augmentation strategy is proposed, which drops the noisy neighbors of hub nodes on one side, and discovers the latent neighbors and generates pseudo neighbors for tail nodes on the other side. It can also alleviate the structural imbalance between two types of nodes. Finally, a GNN model will be retrained on the augmented graph. Extensive experiments demonstrate that SAug can significantly improve the backbone GNNs and achieve superior performance to its competitors of graph augmentation methods and hub/tail aware methods.
翻译:图机器学习在节点分类、链接预测、图分类等方面取得了巨大的进展。然而,现实中的图往往存在结构失衡,即仅有少数中心节点具有更密集的局部结构和更高的影响力。这种失衡可能会损害现有图机器学习模型的鲁棒性,特别是在学习尾结点时。本文提出了一种选择性图增强方法(SAug)来解决这个问题。首先,设计了一种基于PageRank的采样策略来识别图中的中心节点和尾结点。其次,提出了一种选择性增强策略,该策略在一侧删除中心节点的噪声邻居,在另一侧发现潜在的邻居,并为尾结点生成伪邻居。它也可以缓解两种类型节点之间的结构失衡。最后,在增强的图上重新训练GNN模型。广泛的实验表明,SAug可以显著改善骨干GNN,并且比其竞争对手图增强方法和中心/尾部感知方法取得更好的性能。