一个更便宜且更好的软掩蔽噪声扩散语言模型 (A Cheaper and Better Diffusion Language Model with Soft-Masked Noise) - 专知论文

会员服务 ·

0

文本数据 · 连续空间 · 噪声 · 离散 · 扩散模型 ·

2023 年 4 月 10 日

A Cheaper and Better Diffusion Language Model with Soft-Masked Noise

翻译：一个更便宜且更好的软掩蔽噪声扩散语言模型

Jiaao Chen,Aston Zhang,Mu Li,Alex Smola,Diyi Yang

from arxiv, Code is available at https://github.com/amazon-science/masked-diffusion-lm

Diffusion models that are based on iterative denoising have been recently proposed and leveraged in various generation tasks like image generation. Whereas, as a way inherently built for continuous data, existing diffusion models still have some limitations in modeling discrete data, e.g., languages. For example, the generally used Gaussian noise can not handle the discrete corruption well, and the objectives in continuous spaces fail to be stable for textual data in the diffusion process especially when the dimension is high. To alleviate these issues, we introduce a novel diffusion model for language modeling, Masked-Diffuse LM, with lower training cost and better performances, inspired by linguistic features in languages. Specifically, we design a linguistic-informed forward process which adds corruptions to the text through strategically soft-masking to better noise the textual data. Also, we directly predict the categorical distribution with cross-entropy loss function in every diffusion step to connect the continuous space and discrete space in a more efficient and straightforward way. Through experiments on 5 controlled generation tasks, we demonstrate that our Masked-Diffuse LM can achieve better generation quality than the state-of-the-art diffusion models with better efficiency.

翻译：基于迭代去噪的扩散模型最近已被提出并在各种生成任务中得到应用，如图像生成。然而，作为一种本质上针对连续数据构建的方式，现有的扩散模型在建模离散数据，如语言时仍存在一些限制。例如，通常使用的高斯噪声无法很好地处理离散的破坏，而连续空间中的目标在扩散过程中的稳定性对于文本数据尤其是高维文本数据来说都是有问题的。为了缓解这些问题，我们引入了一种新型的语言建模扩散模型，被称为掩蔽扩散语言模型，它具有更低的训练成本和更好的性能，受到语言特征的启发。具体而言，我们设计了一种语言感知的正向过程，通过有策略的软掩蔽将破坏添加到文本中，以更好地噪声文本数据。此外，我们直接在每个扩散步骤中使用交叉熵损失函数预测分类分布，以更高效、更直接的方式将连续空间和离散空间连接起来。通过对5个受控生成任务的实验，我们证明了我们的掩蔽扩散语言模型可以比最先进的扩散模型实现更好的生成质量，并具有更好的效率。

0

相关内容

文本数据

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

【ICML2020】文本摘要生成模型PEGASUS

【ICML2020】文本摘要生成模型PEGASUS

专知会员服务

35+阅读 · 2020年8月23日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

【CVPR2020-Facebook AI】单样本自适应域脸生成，One-Shot Domain Adaptation

【CVPR2020-Facebook AI】单样本自适应域脸生成，One-Shot Domain Adaptation

专知会员服务

29+阅读 · 2020年4月6日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

扩散模型在文本生成领域的应用

扩散模型在文本生成领域的应用

新智元

0+阅读 · 2022年10月13日

生成扩散模型漫谈：最优扩散方差估计（上）

生成扩散模型漫谈：最优扩散方差估计（上）

PaperWeekly

0+阅读 · 2022年9月25日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

基于图的半监督学习算法研究

国家自然科学基金

5+阅读 · 2015年12月31日

随机约束下非齐次Markov跳变系统控制器设计

国家自然科学基金

0+阅读 · 2015年12月31日

几类扩散过程的逼近及应用

国家自然科学基金

1+阅读 · 2014年12月31日

有理映射的参数空间

国家自然科学基金

0+阅读 · 2013年12月31日

基于单语语料的无监督统计机器翻译模型研究

国家自然科学基金

1+阅读 · 2013年12月31日

三维椭圆方程Cauchy问题的正则化方法

国家自然科学基金

0+阅读 · 2013年12月31日

高阶非线性发展方程的整体吸引子与数值解法

国家自然科学基金

0+阅读 · 2013年12月31日

Markov状态转换下的跳扩散风险理论的新模型与新算法

国家自然科学基金

1+阅读 · 2012年12月31日

功率变换器非线性不稳定行为的washout滤波器控制方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于二次规划的大规模非线性半定规划问题的理论、算法研究及软件设计

国家自然科学基金

0+阅读 · 2012年12月31日

Accelerating Diffusion Models for Inverse Problems through Shortcut Sampling

Arxiv

0+阅读 · 2023年5月26日

Domain Aligned Prefix Averaging for Domain Generalization in Abstractive Summarization

Arxiv

0+阅读 · 2023年5月26日

Alleviating Exposure Bias in Diffusion Models through Sampling with Shifted Time Steps

Arxiv

0+阅读 · 2023年5月26日

Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models

Arxiv

0+阅读 · 2023年5月25日

Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution

Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution

Arxiv

0+阅读 · 2023年5月24日

Diffusion-Based Audio Inpainting

Arxiv

0+阅读 · 2023年5月24日

Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions

Arxiv

0+阅读 · 2023年5月24日

Prompt Optimization of Large Language Model for Interactive Tasks without Gradient and Demonstrations

Arxiv

0+阅读 · 2023年5月24日

Flexible Grammar-Based Constrained Decoding for Language Models

Arxiv

0+阅读 · 2023年5月24日

A Survey on Generative Diffusion Model

Arxiv

46+阅读 · 2022年9月6日

VIP会员

文章信息

相关主题

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

【ICML2020】文本摘要生成模型PEGASUS

【ICML2020】文本摘要生成模型PEGASUS

专知会员服务

35+阅读 · 2020年8月23日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

【CVPR2020-Facebook AI】单样本自适应域脸生成，One-Shot Domain Adaptation

【CVPR2020-Facebook AI】单样本自适应域脸生成，One-Shot Domain Adaptation

专知会员服务

29+阅读 · 2020年4月6日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争背景下俄罗斯的战略性海军分析（2022-2025年）》最新100页报告

【斯坦福博士论文】数据、决策与依赖：构建可信人工智能的挑战

人工智能时代背景下的未来海战

接触战中的无人机优势：美军旅级部队面临的小型无人机系统挑战与调整

相关资讯

扩散模型在文本生成领域的应用

扩散模型在文本生成领域的应用

新智元

0+阅读 · 2022年10月13日

生成扩散模型漫谈：最优扩散方差估计（上）

生成扩散模型漫谈：最优扩散方差估计（上）

PaperWeekly

0+阅读 · 2022年9月25日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

相关论文

Accelerating Diffusion Models for Inverse Problems through Shortcut Sampling

Arxiv

0+阅读 · 2023年5月26日

Domain Aligned Prefix Averaging for Domain Generalization in Abstractive Summarization

Arxiv

0+阅读 · 2023年5月26日

Alleviating Exposure Bias in Diffusion Models through Sampling with Shifted Time Steps

Arxiv

0+阅读 · 2023年5月26日

Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models

Arxiv

0+阅读 · 2023年5月25日

Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution

Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution

Arxiv

0+阅读 · 2023年5月24日

Diffusion-Based Audio Inpainting

Arxiv

0+阅读 · 2023年5月24日

Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions

Arxiv

0+阅读 · 2023年5月24日

Prompt Optimization of Large Language Model for Interactive Tasks without Gradient and Demonstrations

Arxiv

0+阅读 · 2023年5月24日

Flexible Grammar-Based Constrained Decoding for Language Models

Arxiv

0+阅读 · 2023年5月24日

A Survey on Generative Diffusion Model

Arxiv

46+阅读 · 2022年9月6日

相关基金

基于图的半监督学习算法研究

国家自然科学基金

5+阅读 · 2015年12月31日

随机约束下非齐次Markov跳变系统控制器设计

国家自然科学基金

0+阅读 · 2015年12月31日

几类扩散过程的逼近及应用

国家自然科学基金

1+阅读 · 2014年12月31日

有理映射的参数空间

国家自然科学基金

0+阅读 · 2013年12月31日

基于单语语料的无监督统计机器翻译模型研究

国家自然科学基金

1+阅读 · 2013年12月31日

三维椭圆方程Cauchy问题的正则化方法

国家自然科学基金

0+阅读 · 2013年12月31日

高阶非线性发展方程的整体吸引子与数值解法

国家自然科学基金

0+阅读 · 2013年12月31日

Markov状态转换下的跳扩散风险理论的新模型与新算法

国家自然科学基金

1+阅读 · 2012年12月31日

功率变换器非线性不稳定行为的washout滤波器控制方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于二次规划的大规模非线性半定规划问题的理论、算法研究及软件设计

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员