Semi-supervised learning on graphs is a widely applicable problem in network science and machine learning. Two standard algorithms -- label propagation and graph neural networks -- both operate by repeatedly passing information along edges, the former by passing labels and the latter by passing node features, modulated by neural networks. These two types of algorithms have largely developed separately, and there is little understanding about the structure of network data that would make one of these approaches work particularly well compared to the other or when the approaches can be meaningfully combined. Here, we develop a Markov random field model for the data generation process of node attributes, based on correlations of attributes on and between vertices, that motivates and unifies these algorithmic approaches. We show that label propagation, a linearized graph convolutional network, and their combination can all be derived as conditional expectations under our model, when conditioning on different attributes. In addition, the data model highlights deficiencies in existing graph neural networks (while producing new algorithmic solutions), serves as a rigorous statistical framework for understanding graph learning issues such as over-smoothing, creates a testbed for evaluating inductive learning performance, and provides a way to sample graphs attributes that resemble empirical data. We also find that a new algorithm derived from our data generation model, which we call a Linear Graph Convolution, performs extremely well in practice on empirical data, and provide theoretical justification for why this is the case.
翻译:图表上的半监督学习是网络科学和机器学习中一个广泛应用的问题。两种标准算法 -- -- 标签传播和图形神经网络 -- -- 都通过沿边缘反复传递信息运作,前者通过传递标签,后者通过传递节点特征,由神经网络调节。这两种算法在很大程度上是单独开发的,对于使其中一个方法与其他方法特别有效,或者当方法能够有意义地结合时,使其中一种方法与其他方法特别适用起来的网络数据结构缺乏了解。这里,我们为节点属性的数据生成过程开发了一个Markov随机字段模型,其基础是脊椎的属性和之间的相关性,激励和统一这些算法方法。我们表明,标签传播、线性图形革命网络及其组合都可以作为我们模型下有条件的预期,在调整不同属性时加以调整。此外,数据模型突出现有图形神经网络的缺陷(同时产生新的算法解决办法),作为严格的统计框架,用以了解诸如超缩缩图等图表学习问题的数据生成过程,在模型中创建一个测试台式模型,我们从模型中找到一个分析模型的模型,我们从模型数据生成到模型分析模型,我们从模型中找到一个分析模型的模型的模型,我们从模型的模型数据进行到模型的演化数据,我们从模型的演化的演化的演算方法提供了一种模型,我们从模型的演化数据运行的演化的演化的演化的演算法。