图示SR:Im平衡节点分类的数据增加比值</s> (GraphSR: A Data Augmentation Algorithm for Imbalanced Node Classification)

Graph neural networks (GNNs) have achieved great success in node classification tasks. However, existing GNNs naturally bias towards the majority classes with more labelled data and ignore those minority classes with relatively few labelled ones. The traditional techniques often resort over-sampling methods, but they may cause overfitting problem. More recently, some works propose to synthesize additional nodes for minority classes from the labelled nodes, however, there is no any guarantee if those generated nodes really stand for the corresponding minority classes. In fact, improperly synthesized nodes may result in insufficient generalization of the algorithm. To resolve the problem, in this paper we seek to automatically augment the minority classes from the massive unlabelled nodes of the graph. Specifically, we propose \textit{GraphSR}, a novel self-training strategy to augment the minority classes with significant diversity of unlabelled nodes, which is based on a Similarity-based selection module and a Reinforcement Learning(RL) selection module. The first module finds a subset of unlabelled nodes which are most similar to those labelled minority nodes, and the second one further determines the representative and reliable nodes from the subset via RL technique. Furthermore, the RL-based module can adaptively determine the sampling scale according to current training data. This strategy is general and can be easily combined with different GNNs models. Our experiments demonstrate the proposed approach outperforms the state-of-the-art baselines on various class-imbalanced datasets.

翻译：然而,现有的GNN自然偏向多数类,其数据标签较多,而忽略少数类,其标签较少。传统技术往往采用过度抽样方法,但可能造成过分适应问题。最近,有些工作提议对标签节点中少数群体类的额外节点进行综合,但是,如果这些生成的节点真正适合相应的少数群体类,则没有任何保证。事实上,不正确合成的节点可能导致对算法的概括化不足。为了解决问题,我们在本文件中寻求从图中大规模未贴标签节点中自动增加少数群体类。具体地说,我们建议采用新颖的自我培训战略\textit{GraphSR},以大量多样化的未贴标签节点为基础,扩大少数群体类的额外节点,而基于类似性选择模块和强化学习(RL)选择模块。第一个模块发现一个未贴标签的类级节点,与标签的少数群体节点最为相似,而第二个模块则进一步确定代表性和可靠的GNW模式,然后通过常规的模型,通过常规的模型,确定当前数据模型的模型,可以确定代表性和可靠的模型。</s>