Generative diffusion processes are an emerging and effective tool for image and speech generation. In the existing methods, the underline noise distribution of the diffusion process is Gaussian noise. However, fitting distributions with more degrees of freedom, could help the performance of such generative models. In this work, we investigate other types of noise distribution for the diffusion process. Specifically, we show that noise from Gamma distribution provides improved results for image and speech generation. Moreover, we show that using a mixture of Gaussian noise variables in the diffusion process improves the performance over a diffusion process that is based on a single distribution. Our approach preserves the ability to efficiently sample state in the training diffusion process while using Gamma noise and a mixture of noise.
翻译:生成扩散过程是产生图像和语音生成的一种新而有效的工具。在现有方法中,扩散过程的下划线噪音分布是高斯噪音。然而,在更自由的分布中,安装更自由的分布可以帮助这种基因模型的性能。在这项工作中,我们调查传播过程的其他噪音分布类型。具体地说,我们表明,来自伽玛分布的噪音为图像和语音生成提供了更好的结果。此外,我们表明,在传播过程中使用高斯噪音变量的混合,可以改善基于单一分布的传播过程的性能。我们的方法保持了在培训传播过程中有效取样的能力,同时使用伽马噪音和噪音混合物。