Despite enormous successful applications of graph neural networks (GNNs), theoretical understanding of their generalization ability, especially for node-level tasks where data are not independent and identically-distributed (IID), has been sparse. The theoretical investigation of the generalization performance is beneficial for understanding fundamental issues (such as fairness) of GNN models and designing better learning methods. In this paper, we present a novel PAC-Bayesian analysis for GNNs under a non-IID semi-supervised learning setup. Moreover, we analyze the generalization performances on different subgroups of unlabeled nodes, which allows us to further study an accuracy-(dis)parity-style (un)fairness of GNNs from a theoretical perspective. Under reasonable assumptions, we demonstrate that the distance between a test subgroup and the training set can be a key factor affecting the GNN performance on that subgroup, which calls special attention to the training node selection for fair learning. Experiments across multiple GNN models and datasets support our theoretical results.
翻译:尽管图形神经网络应用非常成功,但对其一般化能力的理论理解,特别是数据不独立和分布相同的节点任务(IID),却很少见。对一般化表现的理论调查有助于理解GNN模型的基本问题(如公平性)和设计更好的学习方法。在本文中,我们在非IID半监督的学习设置下,为GNN提供了一个新的PAC-Bayesian分析。此外,我们分析了未加标记节点不同分组的概括化表现,这使我们能够从理论角度进一步研究GNN的准确性(差异性)类型(不公平性)(不公平性)。在合理的假设下,我们证明试验分组与培训组合之间的距离可能是影响该分组GNN业绩的一个关键因素,这要求特别关注公平学习的培训节点选择。多个GNN模型和数据集的实验支持我们的理论结果。