Transformer models have recently gained popularity in graph representation learning as they have the potential to learn complex relationships beyond the ones captured by regular graph neural networks. The main research question is how to inject the structural bias of graphs into the transformer architecture, and several proposals have been made for undirected molecular graphs and, recently, also for larger network graphs. In this paper, we study transformers over directed acyclic graphs (DAGs) and propose architecture adaptations tailored to DAGs: (1) An attention mechanism that is more efficient than the regular quadratic complexity of transformers and at the same time faithfully captures the DAG structure, and (2) a positional encoding of the DAG's partial order, complementing the former. We rigorously evaluate our framework in ablation studies and show that it is effective in improving different kinds of baseline transformers over various types of data, in experiments ranging from classifying source code graphs to nodes in self-citation networks. In particular, our proposal makes (graph) transformers competitive to or outperform graph neural networks tailored to DAGs.
翻译:最近,变形模型在图形表征学习中越来越受欢迎,因为它们有可能学习超出普通图形神经网络所捕捉的复杂关系之外的复杂关系。主要研究问题是如何将图形的结构偏差注入变压器结构,并针对非定向分子图和最近的大型网络图提出了几项建议。在本文中,我们研究定向单流图(DAGs)的变异器,并针对DAGs提出结构调整:(1) 关注机制比正常的四重复杂度更高效,同时忠实地捕捉DAG结构,(2) DAG部分顺序的定位编码,以补充前者。我们严格评估了我们的框架,并表明它在改进各种类型数据的不同基线变异器方面是有效的,从对源代码图进行分类到自我引用网络的节点等实验。特别是,我们的提案使(绘图)变异器具有竞争力,或超出为DAGs定制的变形变压器网络。