We propose an extension to the transformer neural network architecture for general-purpose graph learning by adding a dedicated pathway for pairwise structural information, called edge channels. The resultant framework - which we call Edge-augmented Graph Transformer (EGT) - can directly accept, process and output structural information of arbitrary form, which is important for effective learning on graph-structured data. Our model exclusively uses global self-attention as an aggregation mechanism rather than static localized convolutional aggregation. This allows for unconstrained long-range dynamic interactions between nodes. Moreover, the edge channels allow the structural information to evolve from layer to layer, and prediction tasks on edges/links can be performed directly from the output embeddings of these channels. We verify the performance of EGT in a wide range of graph-learning experiments on benchmark datasets, in which it outperforms Convolutional/Message-Passing Graph Neural Networks. EGT sets a new state-of-the-art for the quantum-chemical regression task on the OGB-LSC PCQM4Mv2 dataset containing 3.8 million molecular graphs. Our findings indicate that global self-attention based aggregation can serve as a flexible, adaptive and effective replacement of graph convolution for general-purpose graph learning. Therefore, convolutional local neighborhood aggregation is not an essential inductive bias.
翻译:我们建议扩展变压器神经网络架构, 用于通用图解学习, 方法是为双向结构信息增加一条专用路径, 称为边缘通道。 由此形成的框架 — 我们称其为边缘强化的平面变形器( EGT ) — 可以直接接受、 处理和输出任意形式的结构信息, 这对于在图形结构化数据上有效学习非常重要。 我们的模型专门使用全球自留作为集成机制, 而不是静态的局部变速化组合。 这允许节点之间不受限制的长期动态互动。 此外, 边缘通道允许结构信息从层层到层, 以及边缘/ 链接的预测任务可以直接从这些频道的输出嵌入中进行。 我们核查EGT在一系列图表学习实验中的性能, 对基准数据集进行有效学习, 从而超越了进化/ 移动- Passing 图像神经网络。 EGT 在 OGB- LSC PCM4Mv2 数据集中, 能够直接进行结构化化新状态, 包含3. 8百万个基本分子进动图的内变动图。 我们的自变图中学习, 。 一个基于直径图图的内的自变图, 学习, 。