Topology-imbalance is a graph-specific imbalance problem caused by the uneven topology positions of labeled nodes, which significantly damages the performance of GNNs. What topology-imbalance means and how to measure its impact on graph learning remain under-explored. In this paper, we provide a new understanding of topology-imbalance from a global view of the supervision information distribution in terms of under-reaching and over-squashing, which motivates two quantitative metrics as measurements. In light of our analysis, we propose a novel position-aware graph structure learning framework named PASTEL, which directly optimizes the information propagation path and solves the topology-imbalance issue in essence. Our key insight is to enhance the connectivity of nodes within the same class for more supervision information, thereby relieving the under-reaching and over-squashing phenomena. Specifically, we design an anchor-based position encoding mechanism, which better incorporates relative topology position and enhances the intra-class inductive bias by maximizing the label influence. We further propose a class-wise conflict measure as the edge weights, which benefits the separation of different node classes. Extensive experiments demonstrate the superior potential and adaptability of PASTEL in enhancing GNNs' power in different data annotation scenarios.
翻译:上层平衡是一个因标记节点的分布不均的地形位置造成的因图表而异的不平衡问题,它极大地损害了GNNs的表现。 上层-平衡意味着什么以及如何衡量其对图学的影响仍然没有得到充分探讨。 在本文中,我们从全球监督信息分布的角度,从影响过低和过度分化的角度,对表层-平衡提供了一种新的理解,从而缓解了影响过深和过度分化现象。具体地说,我们设计了一个基于锚基位置的编码机制,更好地纳入相对的表层位置,并通过扩大标签影响来增强阶级内部的诱导偏差。我们进一步建议以等级冲突衡量为边缘重量,这有利于不同等级的GND的升级和升级潜力。