While diffusion models have shown great success in image generation, their noise-inverting generative process does not explicitly consider the structure of images, such as their inherent multi-scale nature. Inspired by diffusion models and the empirical success of coarse-to-fine modelling, we propose a new diffusion-like model that generates images through stochastically reversing the heat equation, a PDE that locally erases fine-scale information when run over the 2D plane of the image. We interpret the solution of the forward heat equation with constant additive noise as a variational approximation in the diffusion latent variable model. Our new model shows emergent qualitative properties not seen in standard diffusion models, such as disentanglement of overall colour and shape in images. Spectral analysis on natural images highlights connections to diffusion models and reveals an implicit coarse-to-fine inductive bias in them.
翻译:----
虽然扩散建模已经在图像生成方面取得了巨大的成功,但它们的噪声反演生成过程并没有明确考虑图像的结构,例如它们固有的多尺度性质。受扩散模型和粗到细建模的经验成功启发,我们提出了一种新的类似扩散模型的模型,通过随机翻转热方程 - 一种在图像的二维平面上运行时局部抹去细节的偏微分方程来生成图像。我们将常量加性噪声下的正向热方程的解释为扩散隐变量模型中的变分近似。我们的新模型展示了标准扩散模型中没有看到的新的定性特征,例如对图像整体颜色和形状的解耦。自然图像的谱分析凸显了与扩散模型的联系,并揭示了隐含在其中的粗到细归纳偏置。