We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
翻译:我们引入了一个新的语言代表模式,名为“BERT”,它代表了来自变异器的双向编码器代表。与最近的语言代表模式不同,“BERT”设计的目的是通过在左侧和右侧各层共同设置条件,对未经标记的文本的深度双向表达进行预演。因此,预先培训的“BERT”模式可以再细化一个输出层,为一系列广泛的任务创建最先进的模式,例如问答和语言推断,而没有实质性的任务性结构修改。“BERT”在概念上是简单而有经验性的。它获得了11项自然语言处理任务的最新结果,包括将GLUE的得分提高到80.5%(7.7%的绝对改进率 ), 多NLI的精确度达到86.7% (4.6%的绝对改进率 ), SQuAD v1.1 问题回答测试F1至93.2(1.5点绝对改进率) 和SQUAD v2.测试F1至83.1 (5.1点绝对改进)。