Node classification is an important research topic in graph learning. Graph neural networks (GNNs) have achieved state-of-the-art performance of node classification. However, existing GNNs address the problem where node samples for different classes are balanced; while for many real-world scenarios, some classes may have much fewer instances than others. Directly training a GNN classifier in this case would under-represent samples from those minority classes and result in sub-optimal performance. Therefore, it is very important to develop GNNs for imbalanced node classification. However, the work on this is rather limited. Hence, we seek to extend previous imbalanced learning techniques for i.i.d data to the imbalanced node classification task to facilitate GNN classifiers. In particular, we choose to adopt synthetic minority over-sampling algorithms, as they are found to be the most effective and stable. This task is non-trivial, as previous synthetic minority over-sampling algorithms fail to provide relation information for newly synthesized samples, which is vital for learning on graphs. Moreover, node attributes are high-dimensional. Directly over-sampling in the original input domain could generates out-of-domain samples, which may impair the accuracy of the classifier. We propose a novel framework, GraphSMOTE, in which an embedding space is constructed to encode the similarity among the nodes. New samples are synthesize in this space to assure genuineness. In addition, an edge generator is trained simultaneously to model the relation information, and provide it for those new samples. This framework is general and can be easily extended into different variations. The proposed framework is evaluated using three different datasets, and it outperforms all baselines with a large margin.
翻译:节点分类是图表学习中的一个重要研究课题。 图表神经网络( GNN) 已经实现了节点分类的最新性能。 但是, 现有的 GNN 解决了不同类别节点样本平衡的问题; 在许多真实世界情景中, 有些类别可能比其他类别少得多。 在此情况下, 直接培训 GNN 分类者会低于来自这些少数群体的样本, 并导致亚优性性能。 因此, 开发 GNN (GNN) 非常重要 。 但是, 这项工作相当有限 。 因此, 我们试图将 i. i. d 数据向偏偏偏偏的节点分类任务推广到偏偏偏偏的分类任务中, 特别是, 我们选择采用合成的少数群体过偏重的算算法, 因为以前的合成少数群体过细的算法无法为新合成的节点样本提供关联信息, 而新合成的节点框架对于在图表中学习至关重要。 此外, 节点的基点数据是高空格变量, 将数据转换为直径的模型 。 。 在原始的模型中,, 直径位模型中, 将生成的校格 将生成的 。