While diffusion models have shown great success in image generation, their noise-inverting generative process does not explicitly consider the structure of images, such as their inherent multi-scale nature. Inspired by diffusion models and the empirical success of coarse-to-fine modelling, we propose a new model that generates images through iteratively inverting the heat equation, a PDE that locally erases fine-scale information when run over the 2D plane of the image. We interpret a noise-relaxed solution of the forward heat equation as a variational approximation in a diffusion-like latent variable model. Our new model shows emergent qualitative properties not seen in standard diffusion models, such as disentanglement of overall colour and shape in images and data efficiency. Spectral analysis on natural images highlights connections to diffusion models and reveals implicit inductive biases in them.
翻译:虽然扩散模型在图像生成方面表现出巨大的成功,但其噪声反转基因过程并没有明确地考虑到图像的结构,例如其固有的多尺度性质。在扩散模型和粗皮到软皮建模的成功经验的启发下,我们提出了一个新的模型,通过迭代地翻转热方程式生成图像,即一个本地在图像的2D平面上运行时清除精细级信息的PDE。我们将远热方程式的噪声松动溶液解释为扩散相似的潜在变异模型中的一种变异近似。我们的新模型显示了标准扩散模型中未见的新兴质量特性,例如图像和数据效率中整体颜色和形状的分解。关于自然图像的透视分析突出了与扩散模型的联系,并揭示了这些图中隐含的感性偏见。