Diffusion probabilistic models have been successful in generating high-quality and diverse images. However, traditional models, whose input and output are high-resolution images, suffer from excessive memory requirements, making them less practical for edge devices. Previous approaches for generative adversarial networks proposed a patch-based method that uses positional encoding and global content information. Nevertheless, designing a patch-based approach for diffusion probabilistic models is non-trivial. In this paper, we resent a diffusion probabilistic model that generates images on a patch-by-patch basis. We propose two conditioning methods for a patch-based generation. First, we propose position-wise conditioning using one-hot representation to ensure patches are in proper positions. Second, we propose Global Content Conditioning (GCC) to ensure patches have coherent content when concatenated together. We evaluate our model qualitatively and quantitatively on CelebA and LSUN bedroom datasets and demonstrate a moderate trade-off between maximum memory consumption and generated image quality. Specifically, when an entire image is divided into 2 x 2 patches, our proposed approach can reduce the maximum memory consumption by half while maintaining comparable image quality.
翻译:扩散概率模型在生成高质量和多样化的图像方面已经取得了成功。然而,对于边缘设备来说,传统的输入和输出都是高分辨率图像的模型会由于过多的内存需求而变得不太实用。前面的生成对抗网络方法提出了一种基于补丁的方法,该方法使用位置编码和全局内容信息。然而,为扩散概率模型设计一种基于补丁的方法是非常棘手的。在本文中,我们将提出一种在基于补丁的方式下进行图像生成的扩散概率模型。我们提出了两种补丁生成的条件方法。首先,我们提出了使用一位表示进行位置方面的条件,以确保补丁在正确的位置上。其次,我们提出了全局内容条件 (GCC) 方法,以确保在连接在一起时补丁的内容是一致的。我们在CelebA和LSUN卧室数据集上进行了定性和定量的评估,并展示了最大内存消耗和生成图像质量之间的适度权衡。具体而言,当整个图像被分割成2 x 2的补丁时,我们提出的方法可以将最大内存消耗减半,同时保持相当的图像质量。