同一 Coin 的两面: 图表进化神经网络中的偏差和过度偏移 (Two Sides of the Same Coin: Heterophily and Oversmoothing in Graph Convolutional Neural Networks)

In node classification tasks, heterophily and oversmoothing are two problems that can hurt the performance of graph convolutional neural networks (GCNs). The heterophily problem refers to the model's inability to handle heterophilous graphs where neighboring nodes belong to different classes; the oversmoothing problem refers to the model's degenerated performance with increasing number of layers. These two seemingly unrelated problems have been studied mostly independently, but there is recent empirical evidence that solving one problem may benefit the other. In this work, beyond empirical observations, we aim to: (1) analyze the heterophily and oversmoothing problems from a unified theoretical perspective, (2) identify the common causes of the two problems, and (3) propose simple yet effective strategies to address the common causes. In our theoretical analysis, we show that the common causes of the heterophily and oversmoothing problems--namely, the relative degree of a node and its heterophily level--trigger the node representations in consecutive layers to "move" closer to the original decision boundary, which increases the misclassification rate of node labels under certain constraints. We theoretically show that: (1) Nodes with high heterophily have a higher misclassification rate. (2) Even with low heterophily, degree disparity in a node's neighborhood can influence the movements of node representations and result in a "pseudo-heterophily" situation, which helps to explain oversmoothing. (3) Allowing not only positive but also negative messages during message passing can help counteract the common causes of the two problems. Based on our theoretical insights, we propose simple modifications to the GCN architecture (i.e., learned degree corrections and signed messages), and we show that they alleviate the heteorophily and oversmoothing problems with experiments on 9 networks.

翻译：在节点分类任务中,偏执和过度偏执是两个问题,它们可能损害图形神经神经网络(GCNs)的性能。9个偏执的问题是指模型无法处理相邻节点属于不同阶级的异性嗜血性图表;过度偏执的问题是指模型性能随着层层的增加而下降。这两个似乎无关紧要的问题大多是独立研究的,但最近有经验证据表明,解决一个问题可能有益于另一个问题。在这项工作中,除了经验性观察外,我们的目标是:(1) 从统一的理论角度分析偏执和过度移动的问题,(2) 找出两个问题的共同原因,(3) 提出解决共同原因的简单而有效的战略。在我们理论分析中,我们表明模式偏差和过度偏差问题的共同原因—— 即节点的相对程度及其偏差程度, 将连续两层的节点表示“运动” 接近最初的理论状态, 并且不会使我们更接近最初的决定状态, 导致高层次的偏差率率。我们表明, 高层次的偏差率率会显示, 高的偏差率会显示我们之间的偏差率率, 。