Diffusion models (DMs) have achieved state-of-the-art results for image synthesis tasks as well as density estimation. Applied in the latent space of a powerful pretrained autoencoder (LDM), their immense computational requirements can be significantly reduced without sacrificing sampling quality. However, DMs and LDMs lack a semantically meaningful representation space as the diffusion process gradually destroys information in the latent variables. We introduce a framework for learning such representations with diffusion models (LRDM). To that end, a LDM is conditioned on the representation extracted from the clean image by a separate encoder. In particular, the DM and the representation encoder are trained jointly in order to learn rich representations specific to the generative denoising process. By introducing a tractable representation prior, we can efficiently sample from the representation distribution for unconditional image synthesis without training of any additional model. We demonstrate that i) competitive image generation results can be achieved with image-parameterized LDMs, ii) LRDMs are capable of learning semantically meaningful representations, allowing for faithful image reconstructions and semantic interpolations. Our implementation is available at https://github.com/jeremiastraub/diffusion.
翻译:集成模型(DMs)在图像合成任务和密度估计方面达到了最先进的结果。在强大、事先受过训练的自动编码器(LDM)的潜在空间中应用,其巨大的计算要求可以大大降低,而不会牺牲取样质量。然而,随着传播过程逐渐摧毁潜在变量中的信息,DMs和LDMs缺少一个具有语义意义的表达空间。我们引入了一个框架,用扩散模型(LDMs)来学习这种表达方式(LDMs),为此,LDMs以单独编码器从清洁图像中提取的表述方式为条件。特别是,DMs和代表编码器经过联合培训,以学习与基因分解过程有关的丰富的表述方式。通过采用先前的可移植的表述方式,我们可以有效地从代表分布中抽取样本,用于无条件的图像合成,而无需对任何其他模型进行培训。我们证明,i)通过图像分度式的LDDDMs(二)可以取得具有竞争性的图像生成结果。LRDMSMS能够学习具有语义性有意义的表述方式,从而进行忠实的图像重组和语义性图义性解释。我们的实施可在 http://gimb/srevalifl。我们在httpremayl。