Diffusion Probabilistic models have been shown to generate state-of-the-art results on several competitive image synthesis benchmarks but lack a low-dimensional, interpretable latent space, and are slow at generation. On the other hand, Variational Autoencoders (VAEs) typically have access to a low-dimensional latent space but exhibit poor sample quality. Despite recent advances, VAEs usually require high-dimensional hierarchies of the latent codes to generate high-quality samples. We present DiffuseVAE, a novel generative framework that integrates VAE within a diffusion model framework, and leverage this to design a novel conditional parameterization for diffusion models. We show that the resulting model can improve upon the unconditional diffusion model in terms of sampling efficiency while also equipping diffusion models with the low-dimensional VAE inferred latent code. Furthermore, we show that the proposed model can generate high-resolution samples and exhibits synthesis quality comparable to state-of-the-art models on standard benchmarks. Lastly, we show that the proposed method can be used for controllable image synthesis and also exhibits out-of-the-box capabilities for downstream tasks like image super-resolution and denoising. For reproducibility, our source code is publicly available at \url{https://github.com/kpandey008/DiffuseVAE}.
翻译:虽然最近取得了一些进步,但VAE通常要求对潜在代码进行高维分级,以生成高质量的样本。我们提出了DiffuseVAE,这是将VAE纳入一个推广模型框架的新型基因化框架,利用这一框架设计一个新的、可解释的、有条件的传播模型。我们表明,所产生的模型可以在取样效率方面改进无条件的传播模型,同时用低维VAE推断的潜伏代码为扩散模型提供设备。此外,我们表明,拟议的模型可以产生高分辨率样本,并展示与标准基准方面的最新模型可比的合成质量。最后,我们表明,拟议的方法可以用于可控图像合成,也可以用于在可控性DAE内展示用于扩散模型的新颖的有条件参数。我们表明,所产生的模型可以在抽样效率方面改进无条件的传播模型,同时用低维维维的推断潜在代码来装备扩散模型。此外,我们表明,拟议的模型可以产生高分辨率样本,并展示与标准基准方面的最新模型相比的合成质量。最后,我们表明,拟议的方法可以用于可控的图像合成,并展示Dbasimmus-comlifor decomlify 用于我们现有的下游图像源/decomliformus。