We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%.
翻译:我们引入了一个新的语言代表模式,叫做BERT, 即来自变异器的双向编码器代表。 与最近的语言代表模式不同, BERT旨在通过对左侧和右侧各层进行联合调节,对深度的双向表达进行预先培训。 结果, 预先培训的BERT代表可以再细化一个输出层, 以创造一系列广泛任务的最新模式, 如问题回答和语言推导, 而不进行实质性任务特定结构修改。 BERT在概念上简单, 经验上强大。 它在11项自然语言处理任务上获得了新的最新结果, 包括将GLUE基准提高到80.4% (7.6% 绝对改进 ), 多NLI精度提高到86.7 (5.6% 绝对改进 ) 和 SQuAD v1.1 问题回答测试F1至93.2( 绝对改进 1.5% ), 表现优于人类业绩2.0 % 。