带有内容保存的图表数据增强战略 (A Graph Data Augmentation Strategy with Entropy Preservation)

The Graph Convolutional Networks (GCN) proposed by Kipf and Welling is an effective model for semi-supervised learning, but faces the obstacle of over-smoothing, which will weaken the representation ability of GCN. Recently some works are proposed to tackle above limitation by randomly perturbing graph topology or feature matrix to generate data augmentations as input for training. However, these operations inevitably do damage to the integrity of information structures and have to sacrifice the smoothness of feature manifold. In this paper, we first introduce a novel graph entropy definition as a measure to quantitatively evaluate the smoothness of a data manifold and then point out that this graph entropy is controlled by triangle motif-based information structures. Considering the preservation of graph entropy, we propose an effective strategy to generate randomly perturbed training data but maintain both graph topology and graph entropy. Extensive experiments have been conducted on real-world datasets and the results verify the effectiveness of our proposed method in improving semi-supervised node classification accuracy compared with a surge of baselines. Beyond that, our proposed approach could significantly enhance the robustness of training process for GCN.

翻译：Kipf和Welling提出的“图变图网络”(GCN)是半监督学习的有效模式,但面临过度移动的障碍,这将削弱GCN的代表性能力。最近,一些工程建议通过随机扰动的图形表层或特征矩阵来解决上述局限性,以产生数据增强作为培训投入的数据。然而,这些行动不可避免地损害信息结构的完整性,必须牺牲特征的平滑性。在本文件中,我们首先引入了一个新颖的图动画昆虫定义,作为定量评估数据元体的顺利性的一项措施,然后指出,该图酶受三角模型基信息结构的控制。考虑到图的保存,我们提出了一种有效的战略,以随机扰动的图形表层或特征矩阵生成数据作为培训的投入,但既保留了图形表层和图本。在现实世界数据集上进行了广泛的实验,结果证实了我们所提议的方法在改进半超级节点分类准确性和基准激增方面的有效性。除此之外,我们提出的方法可以大大加强G培训过程的稳健性。