Recently, diffusion models have shown remarkable results in image synthesis by gradually removing noise and amplifying signals. Although the simple generative process surprisingly works well, is this the best way to generate image data? For instance, despite the fact that human perception is more sensitive to the low frequencies of an image, diffusion models themselves do not consider any relative importance of each frequency component. Therefore, to incorporate the inductive bias for image data, we propose a novel generative process that synthesizes images in a coarse-to-fine manner. First, we generalize the standard diffusion models by enabling diffusion in a rotated coordinate system with different velocities for each component of the vector. We further propose a blur diffusion as a special case, where each frequency component of an image is diffused at different speeds. Specifically, the proposed blur diffusion consists of a forward process that blurs an image and adds noise gradually, after which a corresponding reverse process deblurs an image and removes noise progressively. Experiments show that the proposed model outperforms the previous method in FID on LSUN bedroom and church datasets. Code is available at https://github.com/sangyun884/blur-diffusion.
翻译:最近,扩散模型在图像合成中通过逐渐去除噪音和放大信号而显示出显著的结果。虽然简单的基因化过程令人惊讶地令人吃惊地发挥作用,但这是产生图像数据的最佳方法吗?例如,尽管人类的感知对图像的低频率更为敏感,扩散模型本身并不考虑每个频率组成部分的相对重要性。因此,为了纳入图像数据的感应偏差,我们提议了一个新型的基因化过程,以粗略到软化的方式将图像合成为一体。首先,我们推广标准扩散模型,方法是使以不同速度旋转的协调系统对矢量的每个组成部分进行扩散。我们进一步建议模糊的传播,作为一个特殊案例,即图像的每个频率组成部分以不同速度扩散。具体地说,拟议的模糊扩散是一个前向过程,即模糊图像并逐渐增加噪音,之后,我们提出一个相应的反向过程将图像除去,并逐渐消除噪音。实验表明,拟议的模型比FID在LSUN卧室和教堂数据集中的前一种方法要差。代码可在 https://githurbubabr.comsyungsyum4。