SefShiftReg: 改进图形神经网络规模化的正规化方法 (SizeShiftReg: a Regularization Method for Improving Size-Generalization in Graph Neural Networks)

In the past few years, graph neural networks (GNNs) have become the de facto model of choice for graph classification. While, from the theoretical viewpoint, most GNNs can operate on graphs of any size, it is empirically observed that their classification performance degrades when they are applied on graphs with sizes that differ from those in the training data. Previous works have tried to tackle this issue in graph classification by providing the model with inductive biases derived from assumptions on the generative process of the graphs, or by requiring access to graphs from the test domain. The first strategy is tied to the quality of the assumptions made for the generative process, and requires the use of specific models designed after the explicit definition of the generative process of the data, leaving open the question of how to improve the performance of generic GNN models in general settings. On the other hand, the second strategy can be applied to any GNN, but requires access to information that is not always easy to obtain. In this work we consider the scenario in which we only have access to the training data, and we propose a regularization strategy that can be applied to any GNN to improve its generalization capabilities from smaller to larger graphs without requiring access to the test data. Our regularization is based on the idea of simulating a shift in the size of the training graphs using coarsening techniques, and enforcing the model to be robust to such a shift. Experimental results on standard datasets show that popular GNN models, trained on the 50% smallest graphs in the dataset and tested on the 10% largest graphs, obtain performance improvements of up to 30% when trained with our regularization strategy.

翻译：在过去几年里,图形神经网络(GNN)已经成为图表分类的实际选择模式。虽然从理论角度看,大多数GNN可以使用任何大小的图表操作,但从经验上看,当它们被应用在与培训数据大小不同的图表上时,它们的分类性能会降低。过去的工作试图在图形分类中解决这一问题,办法是提供基于图形基因化过程假设的模型,或通过要求访问测试域域中的最大图表。从理论角度看,第一个战略与为基因化进程所作的假设的质量挂钩,并且需要使用在数据基因化过程明确定义之后设计的具体模型,从而在一般情况下如何改进通用GNNN模型的性能。另一方面,第二个战略可以适用于任何GNN,但需要获得并非易得的信息。在这个工作中,我们考虑的是,在普通图中我们只能获得10项培训数据的假设,我们提议了一个正规化战略,在经过培训的GNNNB中可以应用到任何经过培训的50种经过训练的模型,然后用我们经过训练的GNNB的模型来改进其一般数据能力。另一方面,第二个战略可以用来改进任何GNNN值的标准化的模型,然后用我们的G的更小的模型,然后用我们的GRO化模型进行更小的模型来改进。在任何GNNNB的模型来改进。在使用我们的G的模型,然后用更小的模型来改进我们的G的模型来改进。