In recent years, benefiting from the expressive power of Graph Convolutional Networks (GCNs), significant breakthroughs have been made in face clustering area. However, rare attention has been paid to GCN-based clustering on imbalanced data. Although imbalance problem has been extensively studied, the impact of imbalanced data on GCN- based linkage prediction task is quite different, which would cause problems in two aspects: imbalanced linkage labels and biased graph representations. The former is similar to that in classic image classification task, but the latter is a particular problem in GCN-based clustering via linkage prediction. Significantly biased graph representations in training can cause catastrophic over-fitting of a GCN model. To tackle these challenges, we propose a linkage-based doubly imbalanced graph learning framework for face clustering. In this framework, we evaluate the feasibility of those existing methods for imbalanced image classification problem on GCNs, and present a new method to alleviate the imbalanced labels and also augment graph representations using a Reverse-Imbalance Weighted Sampling (RIWS) strategy. With the RIWS strategy, probability-based class balancing weights could ensure the overall distribution of positive and negative samples; in addition, weighted random sampling provides diverse subgraph structures, which effectively alleviates the over-fitting problem and improves the representation ability of GCNs. Extensive experiments on series of imbalanced benchmark datasets synthesized from MS-Celeb-1M and DeepFashion demonstrate the effectiveness and generality of our proposed method. Our implementation and the synthesized datasets will be openly available on https://github.com/espectre/GCNs_on_imbalanced_datasets.
翻译:近些年来,由于图表革命网络(GCNs)的显性力量,在面对群集领域取得了重大突破;然而,很少注意GCN的不平衡数据群群群群群群群群群群群群群群集的不平衡问题。虽然对不平衡问题进行了广泛研究,但基于GCN的联系预测任务中不平衡数据的影响却大不相同,这在两个方面会造成问题:联系标签不平衡和图示偏差。前者与经典图像分类任务相似,但后者是GCN通过链接预测进行公开的平衡群集中的一个特殊问题。在培训中明显偏差的图形群群群群群群群群群群群群群群群群可能会造成灾难性的GCN模型化模型化模型。为了应对这些挑战,我们提议了一个基于链接的双重不平衡图形学习框架。在此框架内,我们评估了这些现有方法对于GCNs的不平衡图像分类问题的可行性,并提出了一种新的方法来缓解不平衡的标签,同时利用逆向-Imal-Isermal Commission (RIWS) 战略加强图表的平衡。加上了基于概率的级别平衡的分类组群集群集组群集量加权的加权加权加权加权加权加权加权加权加权加权加权加权的加权加权加权的模型,我们的数据,这可以确保GGGCNsmlBsmls 的抽样的模拟的抽样的抽样的模型的模型的模型的精确度数据和精确度的精确度结构的精确度的精确度结构的精确度结构的精确度结构的精确度结构的精确度结构。