DUPBERT: 改进带有传播模型的生成式遮盖语言模型 (DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models) - 专知论文

会员服务 ·

0

语言模型化 · 掩码语言模型化 · MoDELS · 掩码 · Processing（编程语言） ·

2022 年 11 月 30 日

DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models

翻译：DUPBERT: 改进带有传播模型的生成式遮盖语言模型

Zhengfu He,Tianxiang Sun,Kuanning Wang,Xuanjing Huang,Xipeng Qiu

from arxiv, Work in progress. Code publicly available at https://github.com/Hzfinfdu/Diffusion-BERT

We present DiffusionBERT, a new generative masked language model based on discrete diffusion models. Diffusion models and many pre-trained language models have a shared training objective, i.e., denoising, making it possible to combine the two powerful models and enjoy the best of both worlds. On the one hand, diffusion models offer a promising training strategy that helps improve the generation quality. On the other hand, pre-trained denoising language models (e.g., BERT) can be used as a good initialization that accelerates convergence. We explore training BERT to learn the reverse process of a discrete diffusion process with an absorbing state and elucidate several designs to improve it. First, we propose a new noise schedule for the forward diffusion process that controls the degree of noise added at each step based on the information of each token. Second, we investigate several designs of incorporating the time step into BERT. Experiments on unconditional text generation demonstrate that DiffusionBERT achieves significant improvement over existing diffusion models for text (e.g., D3PM and Diffusion-LM) and previous generative masked language models in terms of perplexity and BLEU score.

翻译：我们介绍了基于离散扩散模型的新型基因化隐形语言模型,即扩散模型和许多经过预先培训的语言模型,有一个共同的培训目标,即:拆除,使两个强大的模型合二为一,享受两个世界的最佳利益。一方面,传播模型提供了有希望的培训战略,有助于提高发电质量。另一方面,预先培训的解除隐形语言模型(如BERT)可以作为一种良好的初始化,加速融合。我们探索培训BERT,以学习吸收状态的离散扩散进程的反向进程,并阐明改进这一进程的若干设计。首先,我们为前方传播进程提出一个新的噪音时间表,以控制根据每个象征的信息在每一步骤上增加的噪音程度。第二,我们调查将时间步骤纳入BERT的若干设计。关于无条件生成文本的实验表明,DiflBERT在每分级和BLM级中的现有传播模型和以前的基因化化化的隐形语言模型取得了显著的改进。

0

相关内容

语言模型化

语言模型化

不可错过！首门《自监督学习统计模型》课程！霍普金斯Daniel Khashabi讲授

不可错过！首门《自监督学习统计模型》课程！霍普金斯Daniel Khashabi讲授

专知会员服务

24+阅读 · 2022年9月30日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

专知会员服务

69+阅读 · 2020年1月2日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

局部条件下的二阶哈密顿系统同宿轨的存在性与多重性

国家自然科学基金

0+阅读 · 2014年12月31日

向列型液晶模型的数学理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于单根ZnO纳米线的近紫外电致发光器件的研究

国家自然科学基金

0+阅读 · 2013年12月31日

随机变量结构的模型论

国家自然科学基金

0+阅读 · 2013年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

表面等离激元增强型非极性面AlGaN基深紫外LED器件的基础研究

国家自然科学基金

0+阅读 · 2012年12月31日

非局部模型的自适应算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

硅基GaN HEMTs超级结器件及其模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

图像恢复和填补中的新的模型与算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models

Arxiv

0+阅读 · 2023年2月1日

Zero-shot-Learning Cross-Modality Data Translation Through Mutual Information Guided Stochastic Diffusion

Arxiv

0+阅读 · 2023年1月31日

Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation

Arxiv

0+阅读 · 2023年1月31日

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining

Arxiv

0+阅读 · 2023年1月30日

Refining Generative Process with Discriminator Guidance in Score-based Diffusion Models

Arxiv

0+阅读 · 2023年1月29日

Large Language Models are Zero-Shot Reasoners

Arxiv

0+阅读 · 2023年1月29日

A Comparative Study of Pretrained Language Models for Long Clinical Text

Arxiv

0+阅读 · 2023年1月27日

A denoting diffusion model for fluid flow prediction

Arxiv

0+阅读 · 2023年1月27日

Understanding Diffusion Models: A Unified Perspective

Arxiv

14+阅读 · 2022年8月25日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

VIP会员

文章信息

相关主题

语言模型化

掩码语言模型化

Processing（编程语言）

相关VIP内容

不可错过！首门《自监督学习统计模型》课程！霍普金斯Daniel Khashabi讲授

不可错过！首门《自监督学习统计模型》课程！霍普金斯Daniel Khashabi讲授

专知会员服务

24+阅读 · 2022年9月30日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

专知会员服务

69+阅读 · 2020年1月2日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models

Arxiv

0+阅读 · 2023年2月1日

Zero-shot-Learning Cross-Modality Data Translation Through Mutual Information Guided Stochastic Diffusion

Arxiv

0+阅读 · 2023年1月31日

Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation

Arxiv

0+阅读 · 2023年1月31日

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining

Arxiv

0+阅读 · 2023年1月30日

Refining Generative Process with Discriminator Guidance in Score-based Diffusion Models

Arxiv

0+阅读 · 2023年1月29日

Large Language Models are Zero-Shot Reasoners

Arxiv

0+阅读 · 2023年1月29日

A Comparative Study of Pretrained Language Models for Long Clinical Text

Arxiv

0+阅读 · 2023年1月27日

A denoting diffusion model for fluid flow prediction

Arxiv

0+阅读 · 2023年1月27日

Understanding Diffusion Models: A Unified Perspective

Arxiv

14+阅读 · 2022年8月25日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

相关基金

局部条件下的二阶哈密顿系统同宿轨的存在性与多重性

国家自然科学基金

0+阅读 · 2014年12月31日

向列型液晶模型的数学理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于单根ZnO纳米线的近紫外电致发光器件的研究

国家自然科学基金

0+阅读 · 2013年12月31日

随机变量结构的模型论

国家自然科学基金

0+阅读 · 2013年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

表面等离激元增强型非极性面AlGaN基深紫外LED器件的基础研究

国家自然科学基金

0+阅读 · 2012年12月31日

非局部模型的自适应算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

硅基GaN HEMTs超级结器件及其模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

图像恢复和填补中的新的模型与算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员