Recently, the pre-trained language model, BERT (Devlin et al.(2018)Devlin, Chang, Lee, and Toutanova), has attracted a lot of attention in natural language understanding (NLU), and achieved state-of-the-art accuracy in various NLU tasks, such as sentiment classification, natural language inference, semantic textual similarity and question answering. Inspired by the linearization exploration work of Elman (Elman(1990)), we extend BERT to a new model, StructBERT, by incorporating language structures into pretraining. Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential order of words and sentences, which leverage language structures at the word and sentence levels, respectively. As a result, the new model is adapted to different levels of language understanding required by downstream tasks. The StructBERT with structural pre-training gives surprisingly good empirical results on a variety of downstream tasks, including pushing the state-of-the-art on the GLUE benchmark to 84.5 (with Top 1 achievement on the Leaderboard at the time of paper submission), the F1 score on SQuAD v1.1 question answering to 93.0, the accuracy on SNLI to 91.7.
翻译:最近,经过培训的语文模型BERT(Devlin等人(2018年),Devlin Devlin, Chang, Lee, 和Toutanova)在自然语言理解(NLU)方面引起了许多关注,并在各种非语言任务(如情绪分类、自然语言推断、语义文本相似和回答问题等)中实现了最新水平的准确性。在Elman的线性探索工作(Elman(1990年))的启发下,我们将BERT推广到一个新的模型StructBERT, 将语言结构纳入预培训。具体地说,我们预先培训StructBERT有两项辅助性任务,将语言顺序顺序顺序顺序顺序顺序的文字和句次顺序最优化,分别在文字和句次级别上利用语言结构。因此,新模型适应了下游任务所要求的不同语言理解程度。 结构培训前的StructBERT为一系列下游任务提供了令人惊讶的好的经验结果,包括将GLU基准上的最新技术推向84.5(SNI1,在提交SLI1至F1级的论文上的成绩上,在SLI1级第911级上,在提交SLI1级标准上,在SU1号上实现了。