Different from deep neural networks for non-graph data classification, graph neural networks (GNNs) leverage the information exchange between nodes (or samples) when representing nodes. The category distribution shows an imbalance or even a highly-skewed trend on nearly all existing benchmark GNN data sets. The imbalanced distribution will cause misclassification of nodes in the minority classes, and even cause the classification performance on the entire data set to decrease. This study explores the effects of the imbalance problem on the performances of GNNs and proposes new methodologies to solve it. First, a node-level index, namely, the label difference index ($LDI$), is defined to quantitatively analyze the relationship between imbalance and misclassification. The less samples in a class, the higher the value of its average $LDI$; the higher the $LDI$ of a sample, the more likely the sample will be misclassified. We define a new loss and propose four new methods based on $LDI$. Experimental results indicate that the classification accuracies of the three among our proposed four new methods are better in both transductive and inductive settings. The $LDI$ can be applied to other GNNs.
翻译:与用于非绘图数据分类的深度神经网络不同,图形神经网络(GNNs)在代表节点时利用节点(或样本)之间的信息交流。类别分布显示几乎所有现有基准GNN数据集的不平衡,甚至高度偏斜趋势。分布不平衡将造成少数类节点分类错误,甚至导致整个数据集的分类性能下降。本研究探讨了不平衡问题对GNNs绩效的影响,并提出了解决这一问题的新方法。首先,节点水平指数,即标签差异指数(LDI$)被确定为定量分析不平衡和分类错误之间的关系。一个类别中的样品越少,平均值越高;抽样的美元(LDI)越高,抽样的美元就越有可能被错误分类。我们定义了一个新的损失,并提出了四种基于$LDI$的新方法。实验结果表明,我们提议的四种新方法中的分类范围,即标签差异指数(LDI$$LDI$)被界定为从数量上分析不平衡和分类错误分类之间的关系。一个类别中的样品越少,其平均值越高;样本的美元价值越高,样品就越有可能被错误分解。我们根据$LDI值提出了四种新的方法定义新的方法。实验结果表明,这三种方法的分类在传输和感化中可以被应用到其他GNNNDI。