Much recent work suggests that incorporating syntax information from dependency trees can improve task-specific transformer models. However, the effect of incorporating dependency tree information into pre-trained transformer models (e.g., BERT) remains unclear, especially given recent studies highlighting how these models implicitly encode syntax. In this work, we systematically study the utility of incorporating dependency trees into pre-trained transformers on three representative information extraction tasks: semantic role labeling (SRL), named entity recognition, and relation extraction. We propose and investigate two distinct strategies for incorporating dependency structure: a late fusion approach, which applies a graph neural network on the output of a transformer, and a joint fusion approach, which infuses syntax structure into the transformer attention layers. These strategies are representative of prior work, but we introduce additional model design elements that are necessary for obtaining improved performance. Our empirical analysis demonstrates that these syntax-infused transformers obtain state-of-the-art results on SRL and relation extraction tasks. However, our analysis also reveals a critical shortcoming of these models: we find that their performance gains are highly contingent on the availability of human-annotated dependency parses, which raises important questions regarding the viability of syntax-augmented transformers in real-world applications.
翻译:最近的许多工作都表明,将依赖性树的语法信息纳入三个具有代表性的信息提取任务:语义作用标签(SRL)、名称实体识别和关系提取。我们提出并调查两种不同的战略,以纳入依赖性结构:晚融合方法,在变压器产出上应用图形神经网络,以及联合融合方法,将合成性税收结构引入变压器关注层。这些战略代表了先前的工作,但我们引入了改进性能所必需的其他模式设计要素。我们的经验分析表明,这些受语义税影响变压器在SRL和关系提取任务上获得了最新的结果。然而,我们的分析还揭示了这些模型的关键缺陷:我们发现,其性能收益在变压器关注层中具有很高的可靠性,在人类变压器的可应用上,其真实性能收益在变压器上具有高度的可靠性。