基于BREC数据集的GNN表达能力改进 (Towards Better Evaluation of GNN Expressiveness with BREC Dataset)

Research on the theoretical expressiveness of Graph Neural Networks (GNNs) has developed rapidly, and many methods have been proposed to enhance the expressiveness. However, most methods do not have a uniform expressiveness measure except for a few that strictly follow the $k$-dimensional Weisfeiler-Lehman ($k$-WL) test hierarchy. Their theoretical analyses are often limited to distinguishing certain families of non-isomorphic graphs, leading to difficulties in quantitatively comparing their expressiveness. In contrast to theoretical analysis, another way to measure expressiveness is by evaluating model performance on certain datasets containing 1-WL-indistinguishable graphs. Previous datasets specifically designed for this purpose, however, face problems with difficulty (any model surpassing 1-WL has nearly 100% accuracy), granularity (models tend to be either 100% correct or near random guess), and scale (only a few essentially different graphs in each dataset). To address these limitations, we propose a new expressiveness dataset, $\textbf{BREC}$, which includes 400 pairs of non-isomorphic graphs carefully selected from four primary categories (Basic, Regular, Extension, and CFI). These graphs have higher difficulty (up to 4-WL), finer granularity (able to compare models between 1-WL and 3-WL), and a larger scale (400 pairs). Further, we synthetically test 16 models with higher-than-1-WL expressiveness on our BREC dataset. Our experiment gives the first thorough comparison of the expressiveness of those state-of-the-art beyond-1-WL GNN models. We expect this dataset to serve as a benchmark for testing the expressiveness of future GNNs. Our dataset and evaluation code are released at: https://github.com/GraphPKU/BREC.

翻译：摘要：图神经网络（Graph Neural Networks，GNNs）理论表达能力方面的研究发展迅速，并且许多方法已经被提出来增强表达能力。然而，除了少数严格遵循“$k$-dimensional Weisfeiler-Lehman ($k$-WL)”测试层次的方法外，大多数方法都缺乏一种统一的表达能力度量。它们的理论分析往往仅限于区分某些非同构图的族群，导致难以在定量分析中进行比较。与理论分析相反，评估表达能力的另一种方式是评估模型在某些包含1-WL不可区分图的数据集上的性能。然而，先前专门设计用于此目的的数据集面临着困难（任何超过1-WL的模型几乎具有100％的准确性）、粒度问题（模型往往为100％正确或接近随机猜测）和规模问题（每个数据集中只有几个本质不同的图）。为了解决这些限制，我们提出了一个新的表达能力数据集BREC，其中包括从四个主要类别（基础、正则、扩展和CFI）中精心选择的400对非同构图。这些图形具有更高的难度（高达4-WL）、精细的粒度（能够比较1-WL和3-WL之间的模型）以及更大的规模（400对）。此外，我们在BREC数据集上对高于1-WL表达能力的16个模型进行了合成测试。我们的实验首次全面比较了那些最先进的超过1-WL GNN模型的表达能力。我们期望该数据集可以作为未来GNN表达能力测试的基准。我们的数据集和评估代码已发布在：https://github.com/GraphPKU/BREC。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS 2021】如何使用 Transformer 模型在图表示任务中胜过 GNN

专知会员服务

30+阅读 · 2021年11月21日

【斯坦福】距离编码-为结构表示学习设计更强大的GNN.

专知会员服务

45+阅读 · 2020年9月3日

【ICML2020-斯坦福Facebook-何恺明】神经网络图结构，Graph Structure of Neural Networks

专知会员服务

57+阅读 · 2020年7月14日

【剑桥大学】图网络的主邻域聚合，Principal Neighbourhood Aggregation for Graph Nets

专知会员服务

42+阅读 · 2020年4月22日