Graph contrastive learning (GCL) has attracted a surge of attention due to its superior performance for learning node/graph representations without labels. However, in practice, unlabeled nodes for the given graph usually follow an implicit imbalanced class distribution, where the majority of nodes belong to a small fraction of classes (a.k.a., head class) and the rest classes occupy only a few samples (a.k.a., tail classes). This highly imbalanced class distribution inevitably deteriorates the quality of learned node representations in GCL. Indeed, we empirically find that most state-of-the-art GCL methods exhibit poor performance on imbalanced node classification. Motivated by this observation, we propose a principled GCL framework on Imbalanced node classification (ImGCL), which automatically and adaptively balances the representation learned from GCL without knowing the labels. Our main inspiration is drawn from the recent progressively balanced sampling (PBS) method in the computer vision domain. We first introduce online clustering based PBS, which balances the training sets based on pseudo-labels obtained from learned representations. We then develop the node centrality based PBS method to better preserve the intrinsic structure of graphs, which highlight the important nodes of the given graph. Besides, we theoretically consolidate our method by proving that the classifier learned by balanced sampling without labels on an imbalanced dataset can converge to the optimal balanced classifier with a linear rate. Extensive experiments on multiple imbalanced graph datasets and imbalance settings verify the effectiveness of our proposed framework, which significantly improves the performance of the recent state-of-the-art GCL methods. Further experimental ablations and analysis show that the ImGCL framework remarkably improves the representations of nodes in tail classes.
翻译:对比图形学习(GCL) 因其在学习节点/没有标签的演示中表现优异,吸引了人们的注意。然而,在实践中,对给定图表的未贴标签的节点通常遵循隐含的不平衡类分布,大多数节点属于一小部分类(a.k.a.a.,头类),而其余类只使用少数样本(a.k.a.a.,尾类)。这种高度不平衡的分类分配不可避免地使GCL中学习的节点表述质量恶化。事实上,我们从经验中发现,大多数最先进的GCL方法在不平衡的节点分类中表现不平衡。受此观察的驱使,我们提议了一个关于Imm平衡节点分类分类(IMGCL)的有原则的GCL框架。我们的主要灵感来自计算机视野域最近逐渐平衡的取样方法。我们首先引入基于PBSBS的在线组合,它平衡了从所学的假标签中获取的多级点计算结果。我们随后提出一个关于Imallialal-lialalalal dal deal deal dismal deal deal deal dal dal deal dal dal drodu,我们通过不甚甚甚甚甚甚甚甚甚甚地将Glegleglegal 。