Data augmentation has been widely used to improve generalizability of machine learning models. However, comparatively little work studies data augmentation for graphs. This is largely due to the complex, non-Euclidean structure of graphs, which limits possible manipulation operations. Augmentation operations commonly used in vision and language have no analogs for graphs. Our work studies graph data augmentation for graph neural networks (GNNs) in the context of improving semi-supervised node-classification. We discuss practical and theoretical motivations, considerations and strategies for graph data augmentation. Our work shows that neural edge predictors can effectively encode class-homophilic structure to promote intra-class edges and demote inter-class edges in given graph structure, and our main contribution introduces the GAug graph data augmentation framework, which leverages these insights to improve performance in GNN-based node classification via edge prediction. Extensive experiments on multiple benchmarks show that augmentation via GAug improves performance across GNN architectures and datasets.
翻译:扩大数据已广泛用于改善机器学习模型的通用性。然而,相对较少的工作研究为图表而增加数据。这主要是因为图表结构复杂、非欧洲语言,限制了可能的操作操作。视觉和语言中常用的增强操作没有图形的模拟功能。我们的工作研究为图形神经网络(GNNs)改进半监督节点分类而增加的图形数据。我们讨论了增加图形数据的实际和理论动机、考虑和战略。我们的工作表明,神经边缘预测器可以有效地将等级-嗜血结构编码起来,在给定的图形结构中促进阶级边缘和显示阶级间边缘,我们的主要贡献介绍了GAug图形数据增强框架,该框架利用这些洞察来通过边缘预测改进基于GNN的节点分类的性能。关于多个基准的大规模实验表明,通过GAug增强能够改善GNN和数据集的性能。