In this paper, we propose BANG, a new pretraining model to Bridge the gap between Autoregressive (AR) and Non-autoregressive (NAR) Generation. AR and NAR generation can be uniformly regarded as to what extent previous tokens can be attended, and BANG bridges AR and NAR generation by designing a novel model structure for large-scale pretraining. The pretrained BANG model can simultaneously support AR, NAR and semi-NAR generation to meet different requirements. Experiments on question generation (SQuAD 1.1), summarization (XSum) and dialogue generation (PersonaChat) show that BANG improves NAR and semi-NAR performance significantly as well as attaining comparable performance with strong AR pretrained models. Compared with the semi-NAR strong baselines, BANG achieves absolute improvements of 14.01 and 5.24 in the overall scores of SQuAD 1.1 and XSum, respectively. In addition, BANG achieves absolute improvements of 10.73, 6.39 and 5.90 in the overall scores of SQuAD, XSUM and PersonaChat respectively compared with the strong NAR baselines.
翻译:在本文中,我们提议BANG,这是缩小自动递减和不航空航天(NAR)一代之间差距的一个新的预培训模式。AR和NAR的生成可以被一致地视为可以在多大程度上参加以前的象征性品,而BANG通过设计新的大规模预培训模式结构,将AR和NAR的生成连接起来。预先培训的BANG模式可以同时支持AR、NAR和半NAR的生成,以满足不同的要求。关于问题生成的实验(SQAD 1.1)、总和(XSum)和对话(PerenaChat)表明,BANG大大改进了NAR和半NAR的绩效,并且与强大的AR预先培训模式取得了可比的绩效。与半NAR的强基线相比,BANG在SQAD 1.1和XSum的总分数中分别实现了14.01和5.24的绝对改进。此外,BANG在SQuAD、XSUM和人(5.90)的总分数中分别实现了10.73、6.39和5.90的绝对改进。