This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks. The model is pre-trained using three types of language modeling objectives: unidirectional (both left-to-right and right-to-left), bidirectional, and sequence-to-sequence prediction. The unified modeling is achieved by employing a shared Transformer network and utilizing specific self-attention masks to control what context the prediction conditions on. We can fine-tune UniLM as a unidirectional decoder, a bidirectional encoder, or a sequence-to-sequence model to support various downstream natural language understanding and generation tasks. UniLM compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Moreover, our model achieves new state-of-the-art results on three natural language generation tasks, including improving the CNN/DailyMail abstractive summarization ROUGE-L to 40.63 (2.16 absolute improvement), pushing the CoQA generative question answering F1 score to 82.5 (37.1 absolute improvement), and the SQuAD question generation BLEU-4 to 22.88 (6.50 absolute improvement).
翻译:本文介绍了一个新的统一的经过培训的语文模型(UniLM),可以针对自然语言理解和生成任务进行微调,该模型使用三种类型的语言模型进行预先培训:单向(左对右和右对左)、双向和顺序到顺序的预测。统一模型通过使用共用的变压器网络和使用特定的自我注意面罩来控制预测条件,实现统一模型。我们可以微调UniLM,作为单向解析器、双向编码器或绝对顺序到顺序模型,以支持各种下游自然语言理解和生成任务。UniLM优于GLUE基准上的BERT,以及SQUAD 2.0 和 CoQA 问题回答任务。此外,我们的模型在三种自然语言生成任务上实现了新的状态,包括改进CNN/Dial抽象组合ROUGE-L至40.63 (2.16 绝对改进),将COADAD 改进至22.LQ(6.188) 绝对改进,将SAAD 改进至FAD 改进至22.501 问题。