We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image. SinDiffusion significantly improves the quality and diversity of generated samples compared with existing GAN-based approaches. It is based on two core designs. First, SinDiffusion is trained with a single model at a single scale instead of multiple models with progressive growing of scales which serves as the default setting in prior work. This avoids the accumulation of errors, which cause characteristic artifacts in generated results. Second, we identify that a patch-level receptive field of the diffusion network is crucial and effective for capturing the image's patch statistics, therefore we redesign the network structure of the diffusion model. Coupling these two designs enables us to generate photorealistic and diverse images from a single image. Furthermore, SinDiffusion can be applied to various applications, i.e., text-guided image generation, and image outpainting, due to the inherent capability of diffusion models. Extensive experiments on a wide range of images demonstrate the superiority of our proposed method for modeling the patch distribution.
翻译:我们介绍SinDifulation, 利用拆散的传播模型从单一自然图像中获取补丁的内部分布。 SinDifil 与现有的基于GAN的方法相比,极大地提高了所生成样品的质量和多样性。 它基于两个核心设计。 首先, SinDifil 在一个单一的模型中接受培训,而不是一个作为先前工作中默认设置的逐步扩大规模的多重模型。 这避免了错误的积累,从而在生成的结果中造成特殊文物的特性。 其次, 我们确认, 传播网络的一个补丁级可接收域对于获取图像的补丁统计数据至关重要和有效, 因此, 我们重新设计了传播模型的网络结构。 将这两个设计结合起来, 使我们能够从一个图像中生成具有光真性和多样性的图像。 此外, SinDifilpilation 可以应用到各种应用中, 即文本制成的图像生成, 以及图像排出, 由于传播模型的固有能力。 对广泛的图像进行了广泛的实验, 展示了我们提议的模拟补丁分布方法的优越性。