Research on the theoretical expressiveness of Graph Neural Networks (GNNs) has developed rapidly, and many methods have been proposed to enhance the expressiveness. However, most methods do not have a uniform expressiveness measure except for a few that strictly follow the $k$-dimensional Weisfeiler-Lehman ($k$-WL) test hierarchy. Their theoretical analyses are often limited to distinguishing certain families of non-isomorphic graphs, leading to difficulties in quantitatively comparing their expressiveness. In contrast to theoretical analysis, another way to measure expressiveness is by evaluating model performance on certain datasets containing 1-WL-indistinguishable graphs. Previous datasets specifically designed for this purpose, however, face problems with difficulty (any model surpassing 1-WL has nearly 100% accuracy), granularity (models tend to be either 100% correct or near random guess), and scale (only a few essentially different graphs in each dataset). To address these limitations, we propose a new expressiveness dataset, $\textbf{BREC}$, which includes 400 pairs of non-isomorphic graphs carefully selected from four primary categories (Basic, Regular, Extension, and CFI). These graphs have higher difficulty (up to 4-WL), finer granularity (able to compare models between 1-WL and 3-WL), and a larger scale (400 pairs). Further, we synthetically test 16 models with higher-than-1-WL expressiveness on our BREC dataset. Our experiment gives the first thorough comparison of the expressiveness of those state-of-the-art beyond-1-WL GNN models. We expect this dataset to serve as a benchmark for testing the expressiveness of future GNNs. Our dataset and evaluation code are released at: https://github.com/GraphPKU/BREC.
翻译:摘要:图神经网络(Graph Neural Networks,GNNs)理论表达能力方面的研究发展迅速,并且许多方法已经被提出来增强表达能力。然而,除了少数严格遵循“$k$-dimensional Weisfeiler-Lehman ($k$-WL)”测试层次的方法外,大多数方法都缺乏一种统一的表达能力度量。它们的理论分析往往仅限于区分某些非同构图的族群,导致难以在定量分析中进行比较。与理论分析相反,评估表达能力的另一种方式是评估模型在某些包含1-WL不可区分图的数据集上的性能。然而,先前专门设计用于此目的的数据集面临着困难(任何超过1-WL的模型几乎具有100%的准确性)、粒度问题(模型往往为100%正确或接近随机猜测)和规模问题(每个数据集中只有几个本质不同的图)。为了解决这些限制,我们提出了一个新的表达能力数据集BREC,其中包括从四个主要类别(基础、正则、扩展和CFI)中精心选择的400对非同构图。这些图形具有更高的难度(高达4-WL)、精细的粒度(能够比较1-WL和3-WL之间的模型)以及更大的规模(400对)。此外,我们在BREC数据集上对高于1-WL表达能力的16个模型进行了合成测试。我们的实验首次全面比较了那些最先进的超过1-WL GNN模型的表达能力。我们期望该数据集可以作为未来GNN表达能力测试的基准。我们的数据集和评估代码已发布在:https://github.com/GraphPKU/BREC。