While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG). NLG tasks are often based on the encoder-decoder framework, where the pretrained encoders can only benefit part of it. To reduce this gap, we introduce DeltaLM, a pretrained multilingual encoder-decoder model that regards the decoder as the task layer of off-the-shelf pretrained encoders. Specifically, we augment the pretrained multilingual encoder with a decoder and pre-train it in a self-supervised way. To take advantage of both the large-scale monolingual data and bilingual data, we adopt the span corruption and translation span corruption as the pre-training tasks. Experiments show that DeltaLM outperforms various strong baselines on both natural language generation and translation tasks, including machine translation, abstractive text summarization, data-to-text, and question generation. The code and pretrained models are available at \url{https://aka.ms/deltalm}.
翻译:虽然经过事先培训的编码器在各种自然语言理解(NLU)任务中取得了成功,但这些经过预先培训的编码器和自然语言生成(NLG)之间存在差距。 NLG的任务往往基于编码器-编码器框架,事先培训的编码器只能从其中部分受益。为了缩小这一差距,我们引入了DeltaLM,这是一个经过培训的多语言编码器-编码器模型,将解码器视为现成的未经培训的编码器的任务层。具体地说,我们以自我监督的方式,用解码器和预先培训它来增强经过培训的多语文编码器。为了利用大规模单一语言数据和双语数据,我们采用腐败和翻译作为培训前的任务。实验表明,DeltaLM在自然语言生成和翻译任务上都超越了各种强有力的基线,包括机器翻译、抽象文本摘要、数据对文本和问题生成。经过培训的编码和预设模型可以在\urlhttp{ka.m/delals。