ELMER: 高效和有效制制文本的非自动递递性预先培训语言模式 (ELMER: A Non-Autoregressive Pre-trained Language Model for Efficient and Effective Text Generation)

We study the text generation task under the approach of pre-trained language models (PLMs). Typically, an auto-regressive (AR) method is adopted for generating texts in a token-by-token manner. Despite many advantages of AR generation, it usually suffers from inefficient inference. Therefore, non-autoregressive (NAR) models are proposed to generate all target tokens simultaneously. However, NAR models usually generate texts of lower quality due to the absence of token dependency in the output text. In this paper, we propose ELMER: an efficient and effective PLM for NAR text generation to explicitly model the token dependency during NAR generation. By leveraging the early exit technique, ELMER enables the token generations at different layers, according to their prediction confidence (a more confident token will exit at a lower layer). Besides, we propose a novel pre-training objective, Layer Permutation Language Modeling, to pre-train ELMER by permuting the exit layer for each token in sequences. Experiments on three text generation tasks show that ELMER significantly outperforms NAR models and further narrows the performance gap with AR PLMs (\eg ELMER (29.92) vs BART (30.61) ROUGE-L in XSUM) while achieving over 10 times inference speedup.

翻译：我们根据经过事先培训的语言模型(PLM)研究文本生成任务。通常,采用自动递减(AR)方法,以象征性方式生成文本。尽管AR一代有许多优势,但它通常受到低效率推论的影响。因此,我们提议非自动递增(NAR)模型同时生成所有目标符号。然而,NAR模型通常产生质量较低的文本,因为产出文本中不存在象征性依赖性。在本文中,我们提议ELMER:为NAR生成文本生成一个高效有效的PLM,以明确模拟NAR生成过程中的象征性依赖性。通过利用早期退出技术,ELMER使代代代代代相传,并根据其预测信心(一个更自信的象征将退出低层 ) 。此外,我们提出一个新的培训前目标,即图层变换语言模型,通过对每个符号的退出层进行调整,将ELMERMER引入前。在三个文本生成任务上进行的实验表明,ELMER大大超越了NAR模型的配置,并进一步缩小了ARMERS(30-L)在XMERMS(AR-MERM) 10MS) 上实现ARPARUMS- 10- 的运行差距。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日