Transformers flexibly operate over sets of real-valued vectors representing task-specific entities and their attributes, where each vector might encode one word-piece token and its position in a sequence, or some piece of information that carries no position at all. But as set processors, transformers are at a disadvantage in reasoning over more general graph-structured data where nodes represent entities and edges represent relations between entities. To address this shortcoming, we generalize transformer attention to consider and update edge vectors in each transformer layer. We evaluate this relational transformer on a diverse array of graph-structured tasks, including the large and challenging CLRS Algorithmic Reasoning Benchmark. There, it dramatically outperforms state-of-the-art graph neural networks expressly designed to reason over graph-structured data. Our analysis demonstrates that these gains are attributable to relational attention's inherent ability to leverage the greater expressivity of graphs over sets.
翻译:变压器在代表特定任务实体及其属性的一组实际价值矢量上灵活运作, 每一个矢量可以将一个单质符号及其位置编码成一个序列, 或者某些完全没有位置的信息。 但是, 作为设置的处理器, 变压器在推理更一般的图形结构化数据时处于劣势, 其中节点代表实体和边缘代表实体之间的关系。 为了解决这一缺陷, 我们推广变压器的注意力, 以考虑和更新每个变压层的边缘矢量。 我们评估了这个关联变压器, 以不同种类的图表结构化任务为基础, 包括大型且具有挑战性的 CLRS Algorithalisalism 理性基准 。 在那里, 它明显地超越了专门设计用来解释图形结构数据之上的艺术状态的图形神经网络 。 我们的分析表明, 这些收益可以归因于相关注意的内在能力, 来利用各组图的更清晰度。