We study the text generation task under the approach of pre-trained language models (PLMs). Typically, an auto-regressive (AR) method is adopted for generating texts in a token-by-token manner. Despite many advantages of AR generation, it usually suffers from inefficient inference. Therefore, non-autoregressive (NAR) models are proposed to generate all target tokens simultaneously. However, NAR models usually generate texts of lower quality due to the absence of token dependency in the output text. In this paper, we propose ELMER: an efficient and effective PLM for NAR text generation to explicitly model the token dependency during NAR generation. By leveraging the early exit technique, ELMER enables the token generations at different layers, according to their prediction confidence (a more confident token will exit at a lower layer). Besides, we propose a novel pre-training objective, Layer Permutation Language Modeling, to pre-train ELMER by permuting the exit layer for each token in sequences. Experiments on three text generation tasks show that ELMER significantly outperforms NAR models and further narrows the performance gap with AR PLMs (\eg ELMER (29.92) vs BART (30.61) ROUGE-L in XSUM) while achieving over 10 times inference speedup.
翻译:我们根据经过事先培训的语言模型(PLM)研究文本生成任务。 通常,采用自动递减(AR)方法,以象征性方式生成文本。 尽管AR一代有许多优势,但它通常受到低效率推论的影响。 因此,我们提议非自动递增(NAR)模型同时生成所有目标符号。然而,NAR模型通常产生质量较低的文本,因为产出文本中不存在象征性依赖性。在本文中,我们提议ELMER:为NAR生成文本生成一个高效有效的PLM,以明确模拟NAR生成过程中的象征性依赖性。通过利用早期退出技术,ELMER使代代代代代相传,并根据其预测信心(一个更自信的象征将退出低层 ) 。 此外,我们提出一个新的培训前目标,即图层变换语言模型,通过对每个符号的退出层进行调整,将ELMERMER引入前。 在三个文本生成任务上进行的实验表明,ELMER大大超越了NAR模型的配置,并进一步缩小了ARMERS(30-L)在XMERMS(AR-MERM) 10MS) 上实现ARPARUMS- 10- 的运行差距。