序列生成的学习转移:从单一来源向多种来源转移学习 (Transfer Learning for Sequence Generation: from Single-source to Multi-source)

Multi-source sequence generation (MSG) is an important kind of sequence generation tasks that takes multiple sources, including automatic post-editing, multi-source translation, multi-document summarization, etc. As MSG tasks suffer from the data scarcity problem and recent pretrained models have been proven to be effective for low-resource downstream tasks, transferring pretrained sequence-to-sequence models to MSG tasks is essential. Although directly finetuning pretrained models on MSG tasks and concatenating multiple sources into a single long sequence is regarded as a simple method to transfer pretrained models to MSG tasks, we conjecture that the direct finetuning method leads to catastrophic forgetting and solely relying on pretrained self-attention layers to capture cross-source information is not sufficient. Therefore, we propose a two-stage finetuning method to alleviate the pretrain-finetune discrepancy and introduce a novel MSG model with a fine encoder to learn better representations in MSG tasks. Experiments show that our approach achieves new state-of-the-art results on the WMT17 APE task and multi-source translation task using the WMT14 test set. When adapted to document-level translation, our framework outperforms strong baselines significantly.

翻译：多源序列生成(MSG)是一个重要的序列生成任务,需要多种来源,包括自动编辑后编辑、多源翻译、多文件汇总等。由于MSG的任务存在数据稀缺问题,而且最近经过预先培训的模式已证明对低资源下游任务有效,因此,将预先培训的序列至序列模型转换到MSG任务至关重要。虽然直接微调关于MSG任务的预先培训模式和将多种来源合并成一个单一长序列被认为是将预先培训的模式转移给MSG任务的简单方法,但我们推测,直接微调方法会导致灾难性的遗忘,完全依靠事先培训过的自我注意层获取跨源信息是不够的。因此,我们提出一个两阶段的微调方法,以缓解预先培训-纤维差异,并引入一个带有精细的编码的MSG模型,以更好地体现MSG任务。实验表明,我们的方法在WMT17 APE的任务和多源翻译任务上取得了新的最新状态成果,并利用WMT14 大幅调整了我们的文件化基准。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

专知会员服务

17+阅读 · 2020年4月10日