In recent years, benefiting from the expressive power of Graph Convolutional Networks (GCNs), significant breakthroughs have been made in face clustering. However, rare attention has been paid to GCN-based clustering on imbalanced data. Although imbalance problem has been extensively studied, the impact of imbalanced data on GCN-based linkage prediction task is quite different, which would cause problems in two aspects: imbalanced linkage labels and biased graph representations. The problem of imbalanced linkage labels is similar to that in image classification task, but the latter is a particular problem in GCN-based clustering via linkage prediction. Significantly biased graph representations in training can cause catastrophic overfitting of a GCN model. To tackle these problems, we evaluate the feasibility of those existing methods for imbalanced image classification problem on graphs with extensive experiments, and present a new method to alleviate the imbalanced labels and also augment graph representations using a Reverse-Imbalance Weighted Sampling (RIWS) strategy, followed with insightful analyses and discussions. The code and a series of imbalanced benchmark datasets synthesized from MS-Celeb-1M and DeepFashion are available on https://github.com/espectre/GCNs_on_imbalanced_datasets.
翻译:近些年来,由于图表革命网络(GCN)的显性力量,在表面集群方面取得了重大突破,然而,很少注意以GCN为基础的关于不平衡数据的集群,尽管对不平衡问题进行了广泛研究,但基于GCN的联系预测任务中不平衡数据的影响却大相径庭,这在两个方面会造成问题:联系标签不平衡和图示偏差。联系标签不平衡的问题与图像分类任务类似,但后者是基于GCN的通过链接预测进行集群的一个特殊问题。在培训中明显偏差的图表展示可能造成灾难性地过度配置GCN模型。为了解决这些问题,我们通过广泛试验评估了这些现有方法在图表中造成图像分类不平衡问题的可行性,并提出了一种新的方法,用反偏差标签和偏差的图表表述方式来减轻不平衡的标签,并增加图表的表述,随后进行了深刻的分析与讨论。从MS-CNeleb-1M和DeepFashimon_pregimons上合成的代码和一系列不平衡的基准数据集。