For multilingual sequence-to-sequence pretrained language models (multilingual Seq2Seq PLMs), e.g. mBART, the self-supervised pretraining task is trained on a wide range of monolingual languages, e.g. 25 languages from commoncrawl, while the downstream cross-lingual tasks generally progress on a bilingual language subset, e.g. English-German, making there exists the cross-lingual data discrepancy, namely \textit{domain discrepancy}, and cross-lingual learning objective discrepancy, namely \textit{task discrepancy}, between the pretrain and finetune stages. To bridge the above cross-lingual domain and task gaps, we extend the vanilla pretrain-finetune pipeline with extra code-switching restore task. Specifically, the first stage employs the self-supervised code-switching restore task as a pretext task, allowing the multilingual Seq2Seq PLM to acquire some in-domain alignment information. And for the second stage, we continuously fine-tune the model on labeled data normally. Experiments on a variety of cross-lingual NLG tasks, including 12 bilingual translation tasks, 36 zero-shot translation tasks, and cross-lingual summarization tasks show our model outperforms the strong baseline mBART consistently. Comprehensive analyses indicate our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
翻译:对于多语种序列到顺序的事先经过训练的语言模式(多语种Seq2Seqeq PLMs),例如,mBART, 自我监督的训练前任务是针对多种单语语言的培训,例如来自通用的25种语言,而下游的跨语言任务一般是在双语语言子集上的进展,例如英语-德语,使得存在跨语言的数据差异,即:\ textit{domain difference},以及跨语言学习目标差异,即:Textit{task 差异},在准备阶段和细调阶段之间。为弥合以上跨语言域和任务差距,我们扩展了香草预树-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-直流-翻译-翻译-翻译-直流-翻译-翻译-翻译-翻译-翻译-直流-直流-翻译-直流-直译-翻译-直译-直译-直译-直译-直译-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直