Message passing neural networks (MPNNs) are powerful models for node classification but suffer from performance limitations under heterophily (low same-class connectivity) and structural bottlenecks in the graph. We provide a unifying statistical framework exposing the relationship between heterophily and bottlenecks through the signal-to-noise ratio (SNR) of MPNN representations. The SNR decomposes model performance into feature-dependent parameters and feature-independent sensitivities. We prove that the sensitivity to class-wise signals is bounded by higher-order homophily -- a generalisation of classical homophily to multi-hop neighbourhoods -- and show that low higher-order homophily manifests locally as the interaction between structural bottlenecks and class labels (class-bottlenecks). Through analysis of graph ensembles, we provide a further quantitative decomposition of bottlenecking into underreaching (lack of depth implying signals cannot arrive) and oversquashing (lack of breadth implying signals arriving on fewer paths) with closed-form expressions. We prove that optimal graph structures for maximising higher-order homophily are disjoint unions of single-class and two-class-bipartite clusters. This yields BRIDGE, a graph ensemble-based rewiring algorithm that achieves near-perfect classification accuracy across all homophily regimes on synthetic benchmarks and significant improvements on real-world benchmarks, by eliminating the ``mid-homophily pitfall'' where MPNNs typically struggle, surpassing current standard rewiring techniques from the literature. Our framework, whose code we make available for public use, provides both diagnostic tools for assessing MPNN performance, and simple yet effective methods for enhancing performance through principled graph modification.
翻译:消息传递神经网络(MPNNs)是节点分类的强大模型,但在异质性(同类连接性低)和图结构瓶颈下存在性能限制。我们提出了一个统一的统计框架,通过MPNN表示的信噪比(SNR)揭示了异质性与瓶颈之间的关系。该SNR将模型性能分解为特征相关参数和特征无关的敏感度。我们证明了类间信号的敏感度受高阶同质性(经典同质性在多跳邻域上的推广)的约束,并表明低阶高阶同质性在局部表现为结构瓶颈与类标签(类瓶颈)的相互作用。通过对图集合的分析,我们进一步将瓶颈效应定量分解为欠到达(深度不足导致信号无法抵达)和过压缩(广度不足导致信号通过更少路径抵达),并给出了闭式表达式。我们证明,最大化高阶同质性的最优图结构是单类和双类二分簇的不相交并集。这催生了BRIDGE——一种基于图集合的重连算法,该算法通过在合成基准测试中所有同质性区间实现近乎完美的分类准确率,并在现实世界基准测试中取得显著提升,消除了MPNN通常陷入的"中同质性陷阱",超越了当前文献中的标准重连技术。我们的框架(代码已公开提供)既提供了评估MPNN性能的诊断工具,也提供了通过原则性图修改来提升性能的简单而有效的方法。