利用混合培训改善非航空一代人 (Improving Non-autoregressive Generation with Mixup Training)

While pre-trained language models have achieved great success on various natural language understanding tasks, how to effectively leverage them into non-autoregressive generation tasks remains a challenge. To solve this problem, we present a non-autoregressive generation model based on pre-trained transformer models. To bridge the gap between autoregressive and non-autoregressive models, we propose a simple and effective iterative training method called MIx Source and pseudo Target (MIST). Unlike other iterative decoding methods, which sacrifice the inference speed to achieve better performance based on multiple decoding iterations, MIST works in the training stage and has no effect on inference time. Our experiments on three generation benchmarks including question generation, summarization and paraphrase generation, show that the proposed framework achieves the new state-of-the-art results for fully non-autoregressive models. We also demonstrate that our method can be used to a variety of pre-trained models. For instance, MIST based on the small pre-trained model also obtains comparable performance with seq2seq models.

翻译：虽然经过培训的语文模式在各种自然语言理解任务方面取得了巨大成功,但如何有效地将这些模式用于非偏移的一代任务仍是一项挑战。为了解决这一问题,我们展示了一种基于预先培训的变压器模型的非偏移一代模式。为了缩小自动递减和非非偏移模型之间的差距,我们提出了一种简单而有效的迭代培训方法,称为MIx源代码和假目标(MIST ) 。与其他迭代解码方法不同,这些方法牺牲了根据多重解码迭代法实现更好业绩的推断速度,而MIST在培训阶段工作,对推断时间没有影响。我们对三代基准的实验,包括问题生成、合成和参数生成,表明拟议的框架为完全非偏移模型实现了新的最新结果。我们还表明,我们的方法可以用于各种预先培训的模型。例如,以经过培训的小型模型为基础的MIST也取得了与后代2eq模型相似的业绩。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

55+阅读 · 2020年5月26日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日