Oversmoothing is a common phenomenon in a wide range of Graph Neural Networks (GNNs) and Transformers, where performance worsens as the number of layers increases. Instead of characterizing oversmoothing from the view of complete collapse in which representations converge to a single point, we dive into a more general perspective of dimensional collapse in which representations lie in a narrow cone. Accordingly, inspired by the effectiveness of contrastive learning in preventing dimensional collapse, we propose a novel normalization layer called ContraNorm. Intuitively, ContraNorm implicitly shatters representations in the embedding space, leading to a more uniform distribution and a slighter dimensional collapse. On the theoretical analysis, we prove that ContraNorm can alleviate both complete collapse and dimensional collapse under certain conditions. Our proposed normalization layer can be easily integrated into GNNs and Transformers with negligible parameter overhead. Experiments on various real-world datasets demonstrate the effectiveness of our proposed ContraNorm. Our implementation is available at https://github.com/PKU-ML/ContraNorm.
翻译:超平滑是一种普遍出现在各种图神经网络(GNNs)和Transformer中的现象,随着层数的增加,性能变差。我们并不将超平滑的特征视为完全崩溃,即表示收敛于一个点,而是从更一般化的维度崩溃的角度进行探究,即表示在一个狭窄的锥体内。因此,受到对比学习在防止维度崩溃方面的有效性启发,我们提出了一种新的规范化层,称为ContraNorm。直观上,ContraNorm在嵌入空间中隐式地破坏了表示,导致更均匀的分布和更轻微的维度崩溃。在理论分析上,我们证明了在某些条件下,ContraNorm可以缓解完全崩溃和维度崩溃。我们提出的规范化层可以很容易地与GNNs和Transformer结合使用,参数开销可以忽略不计。在各种真实世界数据集上的实验表明了我们提出的ContraNorm的有效性。我们的实现可以在https://github.com/PKU-ML/ContraNorm 中获得。