长程图基准(Long Range Graph Benchmark)
Vijay Dwivedi(新加坡NTU)发表了一篇关于长程图基准的新博文,介绍了节点分类、链接预测、图分类和图回归中的5个新的挑战性任务。
"许多现有的图学习基准由预测任务组成,这些任务主要依靠局部结构信息而不是远距离信息传播来计算目标标签或指标。这可以从ZINC、OGPG-molhiv和OGPG-molpcba等数据集中观察到,在这些数据集中,主要依靠编码本地(或近本地)结构信息的模型仍然是排行榜上的佼佼者。"
LRGB是一个新的数据集集合,旨在评估MPNN和图变换器的长程能力。特别是,节点分类任务来自基于图像的Pascal-VOC和COCO,链接预测任务来自PCQM4M,询问二维空间中遥远(5跳以上)但在三维空间中接近的原子之间的链接,其中只给出了二维特征,图级任务侧重于预测小型蛋白质(肽)的结构和功能。
众所周知,消息传递网(MPNNs)会受到瓶颈效应和oversquashing的影响,因此,在长距离任务中表现不佳。第一个LRGB实验证实了这一点,表明全连接图transformer相当明显地超过了MPNN。这为MPNN的改进提供了很大的空间!
下面是原文
📏 Long Range Graph Benchmark
Vijay Dwivedi (NTU, Singapore) published a new blogpost on long-range graph benchmarks introducing 5 new challenging tasks in node classification, link prediction, graph classification, and graph regression.
“Many of the existing graph learning benchmarks consist of prediction tasks that primarily rely on local structural information rather than distant information propagation to compute a target label or metric. This can be observed in datasets such as ZINC, ogbg-molhiv and ogbg-molpcba where models that rely significantly on encoding local (or, near-local) structural information continue to be among leaderboard toppers.”
LRGB, a new collection of datasets, aims at evaluating long-range capabilities of MPNNs and graph transformers. Particularly, the node classification tasks were derived from image-based Pascal-VOC and COCO, the link prediction task is derived from PCQM4M asking about links between atoms distant in the 2D space (5+ hops away) but close in the 3D space where only 2D features are given, and the graph-level tasks focus on predicting structures and functions of small proteins (peptides).
Message passing nets (MPNNs) are known to suffer from the bottleneck effects and oversquashing and, hence, underperform in long-range tasks. First LRGB experiments confirm that showing that fully-connected graph transformers quite significantly outperform MPNNs. A big room for improving MPNNs!
Paper, Code, Leaderboard