Graph neural networks (GNNs) can process graphs of different sizes, but their ability to generalize across sizes, specifically from small to large graphs, is still not well understood. In this paper, we identify an important type of data where generalization from small to large graphs is challenging: graph distributions for which the local structure depends on the graph size. This effect occurs in multiple important graph learning domains, including social and biological networks. We first prove that when there is a difference between the local structures, GNNs are not guaranteed to generalize across sizes: there are "bad" global minima that do well on small graphs but fail on large graphs. We then study the size-generalization problem empirically and demonstrate that when there is a discrepancy in local structure, GNNs tend to converge to non-generalizing solutions in practice. Finally, we suggest two approaches for improving size generalization, motivated by our findings. Notably, we propose a novel Self-Supervised Learning (SSL) task aimed at learning meaningful representations of local structures that appear in large graphs. Our SSL task improves classification accuracy on several popular datasets.
翻译:图形神经网络(GNNs) 能够处理不同大小的图形,但是它们能够将不同大小的图形(特别是从小图到大图)加以概括,但目前还不能很好地理解。在本文中,我们确定了从小图到大图具有挑战性的重要数据类型:本地结构取决于图形大小的图形分布。这种效果发生在多个重要的图形学习领域,包括社会和生物网络。我们首先证明,当当地结构存在差异时,GNNs无法保证在大小上加以概括:有“坏”全球迷你马在小图上表现良好,但在大图上却失败了。我们然后用经验研究大小一般化问题,并证明当本地结构存在差异时,GNNs往往会与实践中的非一般化解决方案趋同。最后,我们提出了两种方法来改进大小的概括化,因为我们的发现。值得注意的是,我们提出了一个新型的自监学习(SSL)任务,目的是学习大图中显示的当地结构的有意义表现。我们的SSL任务提高了几个流行数据集的分类准确性。