It is well known that the success of graph neural networks (GNNs) highly relies on abundant human-annotated data, which is laborious to obtain and not always available in practice. When only few labeled nodes are available, how to develop highly effective GNNs remains understudied. Though self-training has been shown to be powerful for semi-supervised learning, its application on graph-structured data may fail because (1) larger receptive fields are not leveraged to capture long-range node interactions, which exacerbates the difficulty of propagating feature-label patterns from labeled nodes to unlabeled nodes; and (2) limited labeled data makes it challenging to learn well-separated decision boundaries for different node classes without explicitly capturing the underlying semantic structure. To address the challenges of capturing informative structural and semantic knowledge, we propose a new graph data augmentation framework, AGST (Augmented Graph Self-Training), which is built with two new (i.e., structural and semantic) augmentation modules on top of a decoupled GST backbone. In this work, we investigate whether this novel framework can learn an effective graph predictive model with extremely limited labeled nodes. We conduct comprehensive evaluations on semi-supervised node classification under different scenarios of limited labeled-node data. The experimental results demonstrate the unique contributions of the novel data augmentation framework for node classification with few labeled data.
翻译:众所周知,图形神经网络(GNNs)的成功高度依赖大量人文加注数据,而这些数据很难获得,而且实际上也并不总是有。当只有为数不多的标签节点存在时,如何开发高效的GNNs仍然没有得到足够的研究。尽管自我培训已证明对半监督学习很有影响力,但在图形结构化数据上的应用可能失败,因为(1) 较大的可接受字段没有被利用来捕捉远程节点互动,这加剧了从标签节点到未标记节点传播特性标签模式的困难;(2) 有限的标签数据使得在没有明确掌握基本的语义结构结构结构的情况下,学习不同节点类的分离决定界限具有挑战性。尽管已经证明自我培训对半监督性学习信息化结构和语义学知识的挑战,但我们建议一个新的图形数据增强框架,AGST(缩略图自我培训)是用两个新的(即结构化和语义化)增强模块在脱钩式GST骨干上传播特征模式;以及(2) 有限标签化的模型分析结果,我们研究这个新定义框架是否能够在极有限的数据评估下有效进行。