Graph Markov Neural Networks (GMNN) have recently been proposed to improve regular graph neural networks (GNN) by including label dependencies into the semi-supervised node classification task. GMNNs do this in a theoretically principled way and use three kinds of information to predict labels. Just like ordinary GNNs, they use the node features and the graph structure but they moreover leverage information from the labels of neighboring nodes to improve the accuracy of their predictions. In this paper, we introduce a new dataset named WikiVitals which contains a graph of 48k mutually referred Wikipedia articles classified into 32 categories and connected by 2.3M edges. Our aim is to rigorously evaluate the contributions of three distinct sources of information to the prediction accuracy of GMNN for this dataset: the content of the articles, their connections with each other and the correlations among their labels. For this purpose we adapt a method which was recently proposed for performing fair comparisons of GNN performance using an appropriate randomization over partitions and a clear separation of model selection and model assessment.
翻译:图形马尔可夫神经网络(Graph Markov Neural Networks,GMNN)是最近提出的一种改进常规图神经网络(GNN)的方法,它通过在半监督节点分类任务中包含标签依赖性来实现这一目的。 GMNN采用理论上的原则性方法,它使用三种信息来预测标签。与普通的GNN一样,它们使用节点特征和图形结构,但是它们还利用相邻节点的标签信息来提高预测的准确性。 在本文中,我们介绍了一个名为WikiVitals的新数据集,其中包含48,000个相互引用的分类为32个类别的维基百科文章的图形,并通过2.3M个边相连。 我们的目标是通过采用最近提出的公正比较GNN性能的方法,使用适当的随机划分和明确的模型选择和模型评估,严格评估GMNN对该数据集三种不同信息源的贡献:文章内容,它们彼此之间的联系以及其标签之间的相关性。