We investigate the representation power of graph neural networks in the semi-supervised node classification task under heterophily or low homophily, i.e., in networks where connected nodes may have different class labels and dissimilar features. Many popular GNNs fail to generalize to this setting, and are even outperformed by models that ignore the graph structure (e.g., multilayer perceptrons). Motivated by this limitation, we identify a set of key designs -- ego- and neighbor-embedding separation, higher-order neighborhoods, and combination of intermediate representations -- that boost learning from the graph structure under heterophily. We combine them into a graph neural network, H2GCN, which we use as the base method to empirically evaluate the effectiveness of the identified designs. Going beyond the traditional benchmarks with strong homophily, our empirical analysis shows that the identified designs increase the accuracy of GNNs by up to 40% and 27% over models without them on synthetic and real networks with heterophily, respectively, and yield competitive performance under homophily.
翻译:我们调查了半监督或低同质节点分类任务中的图形神经网络的表示力,即连接节点可能有不同类别标签和不同特征的网络。许多广受欢迎的GNN没有概括到这一设置,甚至被忽略图形结构的模型(如多层光谱)所超越。受这一限制的驱动,我们确定了一套关键设计 -- -- 自我和邻居分离、高阶邻区和中间代表组合 -- -- 能够促进从不同类别节点下的图形结构中学习。我们把它们合并成一个图形神经网络,H2GCN,我们用它作为实验性方法来评估所确定设计的有效性。我们的经验分析显示,在以强烈同质方式超越传统基准的模型之后,所查明的设计提高了GNN的精确度,即高达40%和27%以上,而没有分别以复杂的方式在合成和真实的网络上进行这些模型,并在同质条件下产生竞争性的性能。