DUPBERT: 改进带有传播模型的生成式遮盖语言模型 (DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models) - 专知论文

会员服务 ·

0

语言模型化 · 掩码语言模型化 · MoDELS · 掩码 · Processing（编程语言） ·

2022 年 11 月 28 日

DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models

翻译：DUPBERT: 改进带有传播模型的生成式遮盖语言模型

Zhengfu He,Tianxiang Sun,Kuanning Wang,Xuanjing Huang,Xipeng Qiu

from arxiv, Work in progress

We present DiffusionBERT, a new generative masked language model based on discrete diffusion models. Diffusion models and many pre-trained language models have a shared training objective, i.e., denoising, making it possible to combine the two powerful models and enjoy the best of both worlds. On the one hand, diffusion models offer a promising training strategy that helps improve the generation quality. On the other hand, pre-trained denoising language models (e.g., BERT) can be used as a good initialization that accelerates convergence. We explore training BERT to learn the reverse process of a discrete diffusion process with an absorbing state and elucidate several designs to improve it. First, we propose a new noise schedule for the forward diffusion process that controls the degree of noise added at each step based on the information of each token. Second, we investigate several designs of incorporating the time step into BERT. Experiments on unconditional text generation demonstrate that DiffusionBERT achieves significant improvement over existing diffusion models for text (e.g., D3PM and Diffusion-LM) and previous generative masked language models in terms of perplexity and BLEU score.

翻译：我们介绍了基于离散扩散模型的新型基因化隐形语言模型,即扩散模型和许多经过预先培训的语言模型,有一个共同的培训目标,即:拆除,使两个强大的模型合二为一,享受两个世界的最佳利益。一方面,传播模型提供了有希望的培训战略,有助于提高发电质量。另一方面,预先培训的解除隐形语言模型(如BERT)可以作为一种良好的初始化,加速融合。我们探索培训BERT,以学习吸收状态的离散扩散进程的反向进程,并阐明改进这一进程的若干设计。首先,我们为前方传播进程提出一个新的噪音时间表,以控制根据每个象征的信息在每一步骤上增加的噪音程度。第二,我们调查将时间步骤纳入BERT的若干设计。关于无条件生成文本的实验表明,DiflBERT在每分级和BLM级中的现有传播模型和以前的基因化化化的隐形语言模型取得了显著的改进。

0

相关内容

语言模型化

语言模型化

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

功能化CdTe量子点生物效应及其机制热动力学研究

国家自然科学基金

0+阅读 · 2015年12月31日

莫来石纤维微观结构调控与力学性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

多层梯度多元掺杂ta-C纳米复合涂层制备与切削性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

Par-4在hTERT非端粒酶活性依赖抗凋亡中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

汽车发动机用铸造铝合金中新型纳米带析出沉淀相的形成机理及其对高温力学性能的影响

国家自然科学基金

0+阅读 · 2012年12月31日

抗菌肽cecropin B对副猪嗜血杆菌诱导耐受机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于微纳光纤-微流控芯片的表面等离激元生物传感器研究

国家自然科学基金

0+阅读 · 2012年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

柑橘多甲氧基黄酮对大肠杆菌的抑菌分子机制与定量构-效关系研究

国家自然科学基金

0+阅读 · 2011年12月31日

14-3-3蛋白与肾脏尿素转运

国家自然科学基金

0+阅读 · 2009年12月31日

NoiseTransfer: Image Noise Generation with Contrastive Embeddings

Arxiv

0+阅读 · 2023年1月31日

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

Arxiv

1+阅读 · 2023年1月30日

DAG: Depth-Aware Guidance with Denoising Diffusion Probabilistic Models

Arxiv

0+阅读 · 2023年1月30日

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models

Arxiv

1+阅读 · 2023年1月30日

On Pre-trained Language Models for Antibody

Arxiv

0+阅读 · 2023年1月28日

A denoting diffusion model for fluid flow prediction

Arxiv

0+阅读 · 2023年1月27日

A Survey on Generative Diffusion Model

Arxiv

46+阅读 · 2022年9月6日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

VIP会员

文章信息

相关主题

语言模型化

掩码语言模型化

Processing（编程语言）

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

相关论文

NoiseTransfer: Image Noise Generation with Contrastive Embeddings

Arxiv

0+阅读 · 2023年1月31日

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

Arxiv

1+阅读 · 2023年1月30日

DAG: Depth-Aware Guidance with Denoising Diffusion Probabilistic Models

Arxiv

0+阅读 · 2023年1月30日

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models

Arxiv

1+阅读 · 2023年1月30日

On Pre-trained Language Models for Antibody

Arxiv

0+阅读 · 2023年1月28日

A denoting diffusion model for fluid flow prediction

Arxiv

0+阅读 · 2023年1月27日

A Survey on Generative Diffusion Model

Arxiv

46+阅读 · 2022年9月6日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

相关基金

功能化CdTe量子点生物效应及其机制热动力学研究

国家自然科学基金

0+阅读 · 2015年12月31日

莫来石纤维微观结构调控与力学性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

多层梯度多元掺杂ta-C纳米复合涂层制备与切削性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

Par-4在hTERT非端粒酶活性依赖抗凋亡中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

汽车发动机用铸造铝合金中新型纳米带析出沉淀相的形成机理及其对高温力学性能的影响

国家自然科学基金

0+阅读 · 2012年12月31日

抗菌肽cecropin B对副猪嗜血杆菌诱导耐受机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于微纳光纤-微流控芯片的表面等离激元生物传感器研究

国家自然科学基金

0+阅读 · 2012年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

柑橘多甲氧基黄酮对大肠杆菌的抑菌分子机制与定量构-效关系研究

国家自然科学基金

0+阅读 · 2011年12月31日

14-3-3蛋白与肾脏尿素转运

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员