While diffusion models have achieved great success in generating continuous signals such as images and audio, it remains elusive for diffusion models in learning discrete sequence data like natural languages. Although recent advances circumvent this challenge of discreteness by embedding discrete tokens as continuous surrogates, they still fall short of satisfactory generation quality. To understand this, we first dive deep into the denoised training protocol of diffusion-based sequence generative models and determine their three severe problems, i.e., 1) failing to learn, 2) lack of scalability, and 3) neglecting source conditions. We argue that these problems can be boiled down to the pitfall of the not completely eliminated discreteness in the embedding space, and the scale of noises is decisive herein. In this paper, we introduce DINOISER to facilitate diffusion models for sequence generation by manipulating noises. We propose to adaptively determine the range of sampled noise scales for counter-discreteness training; and encourage the proposed diffused sequence learner to leverage source conditions with amplified noise scales during inference. Experiments show that DINOISER enables consistent improvement over the baselines of previous diffusion-based sequence generative models on several conditional sequence modeling benchmarks thanks to both effective training and inference strategies. Analyses further verify that DINOISER can make better use of source conditions to govern its generative process.
翻译:虽然传播模型在生成图像和音频等连续信号方面取得了巨大成功,但在学习自然语言等离散序列数据方面,传播模型仍然难以在学习离散序列数据时传播模型。虽然最近的进展通过将离散的象征物作为连续代孕器嵌入离散序列数据,避免了这种离散性的挑战,但它们仍然达不到令人满意的生成质量。为了理解这一点,我们首先深入进入传播基础序列变异模型的无名化培训协议,并确定其三个严重问题,即:(1) 未能学习,(2) 缺乏可缩放性,(3) 忽视源条件。我们认为,这些问题可以归结为嵌入空间中未完全消除离散性的隐患,而噪音的规模在本文件中具有决定性。我们在本文件中引入DINOSER,以便利通过调控噪音生成序列的传播模型。我们提议以适应性的方式确定用于反偏差培训的抽样噪音尺度的范围;鼓励拟议的分散序列学习器在推断过程中以放大的噪声尺度利用源条件。我们的论点是,DIINSER的实验表明,DISER使得在先前的基因扩散序列基准上能够不断改进,从而进一步核查其先前的基因序列基准。