The graph Transformer emerges as a new architecture and has shown superior performance on various graph mining tasks. In this work, we observe that existing graph Transformers treat nodes as independent tokens and construct a single long sequence composed of all node tokens so as to train the Transformer model, causing it hard to scale to large graphs due to the quadratic complexity on the number of nodes for the self-attention computation. To this end, we propose a Neighborhood Aggregation Graph Transformer (NAGphormer) that treats each node as a sequence containing a series of tokens constructed by our proposed Hop2Token module. For each node, Hop2Token aggregates the neighborhood features from different hops into different representations and thereby produces a sequence of token vectors as one input. In this way, NAGphormer could be trained in a mini-batch manner and thus could scale to large graphs. Moreover, we mathematically show that as compared to a category of advanced Graph Neural Networks (GNNs), the decoupled Graph Convolutional Network, NAGphormer could learn more informative node representations from the multi-hop neighborhoods. Extensive experiments on benchmark datasets from small to large are conducted to demonstrate that NAGphormer consistently outperforms existing graph Transformers and mainstream GNNs. Code is available at https://github.com/JHL-HUST/NAGphormer.
翻译:图形变换器是一个新的结构, 显示各种图形采矿任务的优异性能。 在这项工作中, 我们观察到, 现有的图形变换器将节点作为独立符号处理, 并建立一个由所有节点符号组成的单长序列, 以便训练变换器模型, 使得它很难缩放到大图, 因为用于自控计算节点数的二次复杂程度, 使得它很难缩放到大图中。 为此, 我们提议建立一个邻居聚合器( NAGphormer) 。 将每个节点作为包含由我们提议的 Hop2Token 模块构建的一系列代号的序列。 对于每一个节点, Hop2token 将来自不同跳点的邻里特性聚合到不同的表达方式, 从而生成一个序列。 这样, NAGphormer 就可以以小型组合方式训练, 从而可以缩放到大图表。 此外, 我们数学显示, 与高级的神经网络( GNNNNNNNNSM ) 、 解调的变动图解变图像化器网络、 NAAG- GAG- groupal 和 Group 现有的大型模拟模拟模拟模拟模拟模拟, 可以在现有的大型数据库中, 上, 进行更不连续地标定调的GAGDG- GDG- GDFDG- GDG- groismal 。</s>