While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG). NLG tasks are often based on the encoder-decoder framework, where the pretrained encoders can only benefit part of it. To reduce this gap, we introduce DeltaLM, a pretrained multilingual encoder-decoder model that regards the decoder as the task layer of off-the-shelf pretrained encoders. Specifically, we augment the pretrained multilingual encoder with a decoder and pre-train it in a self-supervised way. To take advantage of both the large-scale monolingual data and bilingual data, we adopt the span corruption and translation span corruption as the pre-training tasks. Experiments show that DeltaLM outperforms various strong baselines on both natural language generation and translation tasks, including machine translation, abstractive text summarization, data-to-text, and question generation.
翻译:虽然经过事先培训的编码器在各种自然语言理解任务中取得了成功,但这些经过培训的编码器与自然语言生成之间存在差距。 NLG的任务往往基于编码器解码器解码器框架,而经过培训的编码器只能从其中部分受益。为了缩小这一差距,我们引入了DeltaLM,这是一个经过培训的多语种编码器解码器模型,该模型将解码器视为现成的未经培训的编码器的任务层。具体地说,我们用一种自行监督的方式,用解码器和预导的多语种编码器来增强经过培训的编码器。为了利用大型的单语数据和双语数据,我们将腐败和翻译作为培训前的任务。实验显示,DelLM在自然语言生成和翻译任务上都超越了各种强有力的基线,包括机器翻译、抽象文本汇总、数据对文本和问题生成。