Pre-trained language models like BERT achieve superior performances in various NLP tasks without explicit consideration of syntactic information. Meanwhile, syntactic information has been proved to be crucial for the success of NLP applications. However, how to incorporate the syntax trees effectively and efficiently into pre-trained Transformers is still unsettled. In this paper, we address this problem by proposing a novel framework named Syntax-BERT. This framework works in a plug-and-play mode and is applicable to an arbitrary pre-trained checkpoint based on Transformer architecture. Experiments on various datasets of natural language understanding verify the effectiveness of syntax trees and achieve consistent improvement over multiple pre-trained models, including BERT, RoBERTa, and T5.
翻译:培训前语言模型,如BERT, 在没有明确考虑综合信息的情况下,在各种国家语言规划任务中取得优异的成绩。与此同时,事实证明,综合信息对于NLP应用的成功至关重要。然而,如何将语法树有效和高效地纳入预先培训的变异器中,这一问题仍未得到解决。在本文件中,我们通过提出名为“语法-BERT”的新框架来解决这一问题。这个框架以插接模式运作,并适用于基于变异器结构的任意的预先培训检查站。关于自然语言理解的各种数据集的实验核实了语法树的有效性,并在包括BERT、ROBERTA和T5在内的多种预先培训模式上取得了一致的改进。