Graph neural networks (GNNs) can process graphs of different sizes, but their ability to generalize across sizes, specifically from small to large graphs, is still not well understood. In this paper, we identify an important type of data where generalization from small to large graphs is challenging: graph distributions for which the local structure depends on the graph size. This effect occurs in multiple important graph learning domains, including social and biological networks. We first prove that when there is a difference between the local structures, GNNs are not guaranteed to generalize across sizes: there are "bad" global minima that do well on small graphs but fail on large graphs. We then study the size-generalization problem empirically and demonstrate that when there is a discrepancy in local structure, GNNs tend to converge to non-generalizing solutions. Finally, we suggest two approaches for improving size generalization, motivated by our findings. Notably, we propose a novel Self-Supervised Learning (SSL) task aimed at learning meaningful representations of local structures that appear in large graphs. Our SSL task improves classification accuracy on several popular datasets.
翻译:图形神经网络(GNNs) 能够处理不同大小的图形,但是它们能够将不同大小的图形(特别是从小图到大图)加以概括,但目前还不能很好地理解。在本文中,我们确定了从小图到大图具有挑战性的重要数据类型:本地结构取决于图形大小的图形分布。这种效果发生在多个重要的图形学习领域,包括社会和生物网络。我们首先证明,当当地结构存在差异时,GNNs不能保证在大小上加以概括:小图和大图都“坏”全球微型项目效果良好,但大图却失败了。我们然后用经验研究大小一般化问题,并证明当本地结构存在差异时,GNNs往往会与非一般化解决方案趋同。最后,我们建议了两种方法来改进大小的概括化,因为我们的发现。值得注意的是,我们提出了一个新的自监学习(SSL)任务,目的是学习大图中显示的当地结构的有意义的表达方式。我们的SSL任务提高了几个流行数据集的准确性。