This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks. The model is pre-trained using three types of language modeling objectives: unidirectional (both left-to-right and right-to-left), bidirectional, and sequence-to-sequence prediction. The unified modeling is achieved by employing a shared Transformer network and utilizing specific self-attention masks to control what context the prediction conditions on. We can fine-tune UniLM as a unidirectional decoder, a bidirectional encoder, or a sequence-to-sequence model to support various downstream natural language understanding and generation tasks. UniLM compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Moreover, our model achieves new state-of-the-art results on three natural language generation tasks, including improving the CNN/DailyMail abstractive summarization ROUGE-L to 40.63 (2.16 absolute improvement), pushing the CoQA generative question answering F1 score to 82.5 (37.1 absolute improvement), and the SQuAD question generation BLEU-4 to 22.88 (6.50 absolute improvement). The code and pre-trained models are available at https://github.com/microsoft/unilm.
翻译:本文介绍了一个新的统一的经过培训的语文模型(UniLM),可以针对自然语言理解和生成任务进行微调。该模型经过预先培训,使用三种类型的语言模型:单向(左对右和右对左)、双向和顺序到顺序的预测。统一模型通过使用共用的变压器网络和使用特定的自我注意面罩来控制预测条件。我们可以微调UniLM,作为单向解码器、双向编码器或绝对顺序到顺序模型,以支持各种下游自然语言理解和生成任务。UniLM优于GLUE基准上的BERT,以及SQUAD 2.0 和 CoQA 前回答任务。此外,我们的模型在三种自然语言生成模型中取得了新的状态-艺术结果,包括改进CNN/Daily抽象和ROUGE-L至40.63 (2.16双向编码)或绝对顺序到序列模型模型,将COADFAF1至22号的升级问题。