Difext: 增强空间嵌入中空间用于生成文本传播模式的传播能力 (Difformer: Empowering Diffusion Model on Embedding Space for Text Generation)

Diffusion models have achieved state-of-the-art synthesis quality on visual and audio tasks, and recent works adapt them to textual data by diffusing on the embedding space. But the difference between the continuous data space and the embedding space raises challenges to the diffusion model, which have not been carefully explored. In this paper, we conduct systematic studies and analyze the challenges threefold. Firstly, the data distribution is learnable for embeddings, which may lead to the collapse of the loss function. Secondly, as the norm of embedding varies between popular and rare words, adding the same noise scale will lead to sub-optimal results. In addition, we find that noises sampled from a standard Gaussian distribution may distract the diffusion process. To solve the above challenges, we propose Difformer, a denoising diffusion probabilistic model based on Transformer, which consists of three techniques including utilizing an anchor loss function, a layer normalization module for embeddings, and a norm factor to the Gaussian noise. All techniques are complementary to each other and critical to boosting the model performance together. Experiments are conducted on benchmark datasets over two seminal text generation tasks including machine translation and text summarization. The results show that Difformer significantly outperforms the embedding diffusion baselines, while achieving competitive results with strong autoregressive baselines.

翻译：集成模型在视觉和音频任务上达到了最先进的合成质量, 最近的工作通过在嵌入空间上进行漂移, 使其适应文本数据。但是, 连续数据空间和嵌入空间之间的差别会给扩散模型带来挑战, 但这些挑战尚未仔细探讨。在本文中, 我们进行系统研究和分析挑战。首先, 数据分布可以用于嵌入, 这可能导致损失功能的崩溃。其次, 由于嵌入的规范在流行和稀有词之间有所不同, 添加相同的噪音比例将导致亚优效果。此外, 我们发现, 从标准高斯分布中抽样的噪音可能会分散扩散进程。为了解决上述挑战, 我们提议 Difexer, 一个基于变异器的分解扩散概率模型, 由三种技术组成, 包括使用锁定丢失功能, 嵌入层正常化模块, 以及高斯噪音的规范因素。所有技术都是互相补充的, 并且对于提升模型的超优异性效果。测试了具有竞争力的模型化模型化结果, 包括快速的模型化模型化, 测试了双级的模型化模型化结果, 并测试了模型的模型升级了模型的模制模模模模模模模模。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

102+阅读 · 2020年4月25日