Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks compared with Generative Adversarial Nets (GANs). Recent work on semantic image synthesis mainly follows the \emph{de facto} GAN-based approaches, which may lead to unsatisfactory quality or diversity of generated images. In this paper, we propose a novel framework based on DDPM for semantic image synthesis. Unlike previous conditional diffusion model directly feeds the semantic layout and noisy image as input to a U-Net structure, which may not fully leverage the information in the input semantic mask, our framework processes semantic layout and noisy image differently. It feeds noisy image to the encoder of the U-Net structure while the semantic layout to the decoder by multi-layer spatially-adaptive normalization operators. To further improve the generation quality and semantic interpretability in semantic image synthesis, we introduce the classifier-free guidance sampling strategy, which acknowledge the scores of an unconditional model for sampling process. Extensive experiments on three benchmark datasets demonstrate the effectiveness of our proposed method, achieving state-of-the-art performance in terms of fidelity (FID) and diversity (LPIPS).
翻译:与Generation Adversarial Nets(GANs)相比,在各种图像生成任务方面,DDPM(DDPM)取得了显著的成功。最近关于语义图像合成的工作主要遵循基于 gAN 的方法,这可能导致生成图像的质量不尽人意或多样化。在本文件中,我们提议了一个基于DDPM的新框架,用于语义图像合成。与以前的有条件传播模式不同,它直接为语义布局和噪音图像输入U-Net结构,而U-Net结构可能没有充分利用输入语义遮罩、我们框架的语义布局和噪音图像中的信息。它以不同的方式为U-Net结构的编码器提供噪音图像,而语义布局则可能导致多层空间适应性规范操作者对解码器造成不令人满意的质量或多样性。为了进一步改善语义图像合成的生成质量和语义解释性,我们引入了分类自由指导抽样战略,承认一个无条件的取样模型的分数。在三个基准数据库中进行广泛的实验,展示了我们拟议采用的方法的有效性。