Currently, applying diffusion models in pixel space of high resolution images is difficult. Instead, existing approaches focus on diffusion in lower dimensional spaces (latent diffusion), or have multiple super-resolution levels of generation referred to as cascades. The downside is that these approaches add additional complexity to the diffusion framework. This paper aims to improve denoising diffusion for high resolution images while keeping the model as simple as possible. The paper is centered around the research question: How can one train a standard denoising diffusion models on high resolution images, and still obtain performance comparable to these alternate approaches? The four main findings are: 1) the noise schedule should be adjusted for high resolution images, 2) It is sufficient to scale only a particular part of the architecture, 3) dropout should be added at specific locations in the architecture, and 4) downsampling is an effective strategy to avoid high resolution feature maps. Combining these simple yet effective techniques, we achieve state-of-the-art on image generation among diffusion models without sampling modifiers on ImageNet.
翻译:目前,在高分辨率图像的像素空间中应用扩散模型是困难的。 相反,现有的方法侧重于在低维空间的传播(远程扩散),或者有多重超分辨率的生成水平,称为级联。 其缺点在于这些方法增加了扩散框架的复杂性。 本文旨在改进高分辨率图像的分解扩散,同时尽可能保持模型的简单性。 论文围绕研究问题: 一个人如何在高分辨率图像上培养标准的分解扩散模型, 并且仍然能够取得与这些替代方法类似的性能? 四个主要结论是:(1) 噪音时间表应该调整以适应高分辨率图像;(2) 足以使建筑结构中某一特定部分的规模缩小;(3) 辍学应该添加到建筑中的特定地点;(4) 下标是一种避免高分辨率特征地图的有效战略。 将这些简单而有效的技术结合起来, 我们就可以在不在图像网络上取样的图像模型中实现图像生成方面的最新技术。