BANG: 利用大规模预先培训来连接自动递减型和不航空航天型的一代人 (BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining) - 专知论文

会员服务 ·

0

Performer · 增强现实（AR） · MoDELS · 缩放 · 得分 ·

2021 年 6 月 16 日

BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining

翻译：BANG: 利用大规模预先培训来连接自动递减型和不航空航天型的一代人

Weizhen Qi,Yeyun Gong,Jian Jiao,Yu Yan,Weizhu Chen,Dayiheng Liu,Kewen Tang,Houqiang Li,Jiusheng Chen,Ruofei Zhang,Ming Zhou,Nan Duan

from arxiv, Accepted by ICML 2021

In this paper, we propose BANG, a new pretraining model to Bridge the gap between Autoregressive (AR) and Non-autoregressive (NAR) Generation. AR and NAR generation can be uniformly regarded as to what extent previous tokens can be attended, and BANG bridges AR and NAR generation by designing a novel model structure for large-scale pretraining. The pretrained BANG model can simultaneously support AR, NAR and semi-NAR generation to meet different requirements. Experiments on question generation (SQuAD 1.1), summarization (XSum) and dialogue generation (PersonaChat) show that BANG improves NAR and semi-NAR performance significantly as well as attaining comparable performance with strong AR pretrained models. Compared with the semi-NAR strong baselines, BANG achieves absolute improvements of 14.01 and 5.24 in the overall scores of SQuAD 1.1 and XSum, respectively. In addition, BANG achieves absolute improvements of 10.73, 6.39 and 5.90 in the overall scores of SQuAD, XSUM and PersonaChat respectively compared with the strong NAR baselines.

翻译：在本文中,我们提议BANG,这是缩小自动递减和不航空航天(NAR)一代之间差距的一个新的预培训模式。AR和NAR的生成可以被一致地视为可以在多大程度上参加以前的象征性品,而BANG通过设计新的大规模预培训模式结构,将AR和NAR的生成连接起来。预先培训的BANG模式可以同时支持AR、NAR和半NAR的生成,以满足不同的要求。关于问题生成的实验(SQAD 1.1)、总和(XSum)和对话(PerenaChat)表明,BANG大大改进了NAR和半NAR的绩效,并且与强大的AR预先培训模式取得了可比的绩效。与半NAR的强基线相比,BANG在SQAD 1.1和XSum的总分数中分别实现了14.01和5.24的绝对改进。此外,BANG在SQuAD、XSUM和人(5.90)的总分数中分别实现了10.73、6.39和5.90的绝对改进。

0

相关内容

Performer

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【哈佛-ICLR2020】基于残差能量模型的文本生成，Residual Energy-Based Models for Text Generation

【哈佛-ICLR2020】基于残差能量模型的文本生成，Residual Energy-Based Models for Text Generation

专知会员服务

11+阅读 · 2020年4月27日

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

专知会员服务

33+阅读 · 2020年4月24日

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

专知会员服务

17+阅读 · 2020年4月10日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

27+阅读 · 2019年11月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

35+阅读 · 2019年10月17日

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

专知会员服务

206+阅读 · 2019年9月30日

GPT3 论文详解 | GPT-3: Language Models are Few-Shot Learners

GPT3 论文详解 | GPT-3: Language Models are Few-Shot Learners

AINLP

8+阅读 · 2020年6月3日

GAN新书《生成式深度学习》Generative Deep Learning，附379页全文PDF

GAN新书《生成式深度学习》Generative Deep Learning，附379页全文PDF

专知

96+阅读 · 2019年9月30日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

An Effective Non-Autoregressive Model for Spoken Language Understanding

Arxiv

0+阅读 · 2021年8月16日

Simplifying Paragraph-level Question Generation via Transformer Language Models

Simplifying Paragraph-level Question Generation via Transformer Language Models

Arxiv

0+阅读 · 2021年8月13日

Contrastive Triple Extraction with Generative Transformer

Arxiv

3+阅读 · 2020年9月14日

Learning to Customize Model Structures for Few-shot Dialogue Generation Tasks

Arxiv

3+阅读 · 2020年5月13日

GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation

Arxiv

9+阅读 · 2020年1月26日

DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

Arxiv

5+阅读 · 2019年11月1日

Question Generation by Transformers

Question Generation by Transformers

Arxiv

5+阅读 · 2019年9月14日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Arxiv

14+阅读 · 2019年6月19日

Text Generation with Exemplar-based Adaptive Decoding

Arxiv

4+阅读 · 2019年4月9日

Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement

Arxiv

4+阅读 · 2018年2月19日

VIP会员

文章信息

相关主题

增强现实（AR）

相关VIP内容

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【哈佛-ICLR2020】基于残差能量模型的文本生成，Residual Energy-Based Models for Text Generation

【哈佛-ICLR2020】基于残差能量模型的文本生成，Residual Energy-Based Models for Text Generation

专知会员服务

11+阅读 · 2020年4月27日

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

专知会员服务

33+阅读 · 2020年4月24日

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

专知会员服务

17+阅读 · 2020年4月10日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

27+阅读 · 2019年11月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

35+阅读 · 2019年10月17日

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

专知会员服务

206+阅读 · 2019年9月30日

热门VIP内容

开通专知VIP会员享更多权益服务

遥感大模型：综述与未来设想

【MIT博士论文】医学人工智能中的自然语言基础模型

可解释人工智能中的大语言模型：全面综述

【CVPR2025】DropGaussian: 稀视角高斯溅射的结构正则化

相关资讯

GPT3 论文详解 | GPT-3: Language Models are Few-Shot Learners

GPT3 论文详解 | GPT-3: Language Models are Few-Shot Learners

AINLP

8+阅读 · 2020年6月3日

GAN新书《生成式深度学习》Generative Deep Learning，附379页全文PDF

GAN新书《生成式深度学习》Generative Deep Learning，附379页全文PDF

专知

96+阅读 · 2019年9月30日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

An Effective Non-Autoregressive Model for Spoken Language Understanding

Arxiv

0+阅读 · 2021年8月16日

Simplifying Paragraph-level Question Generation via Transformer Language Models

Simplifying Paragraph-level Question Generation via Transformer Language Models

Arxiv

0+阅读 · 2021年8月13日

Contrastive Triple Extraction with Generative Transformer

Arxiv

3+阅读 · 2020年9月14日

Learning to Customize Model Structures for Few-shot Dialogue Generation Tasks

Arxiv

3+阅读 · 2020年5月13日

GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation

Arxiv

9+阅读 · 2020年1月26日

DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

Arxiv

5+阅读 · 2019年11月1日

Question Generation by Transformers

Question Generation by Transformers

Arxiv

5+阅读 · 2019年9月14日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Arxiv

14+阅读 · 2019年6月19日

Text Generation with Exemplar-based Adaptive Decoding

Arxiv

4+阅读 · 2019年4月9日

Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement

Arxiv

4+阅读 · 2018年2月19日

微信扫码咨询专知VIP会员