Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks compared with Generative Adversarial Nets (GANs). Recent work on semantic image synthesis mainly follows the \emph{de facto} GAN-based approaches, which may lead to unsatisfactory quality or diversity of generated images. In this paper, we propose a novel framework based on DDPM for semantic image synthesis. Unlike previous conditional diffusion model directly feeds the semantic layout and noisy image as input to a U-Net structure, which may not fully leverage the information in the input semantic mask, our framework processes semantic layout and noisy image differently. It feeds noisy image to the encoder of the U-Net structure while the semantic layout to the decoder by multi-layer spatially-adaptive normalization operators. To further improve the generation quality and semantic interpretability in semantic image synthesis, we introduce the classifier-free guidance sampling strategy, which acknowledge the scores of an unconditional model for sampling process. Extensive experiments on three benchmark datasets demonstrate the effectiveness of our proposed method, achieving state-of-the-art performance in terms of fidelity~(FID) and diversity~(LPIPS).
翻译:与Generation Adversarial Nets(GANs)相比,在各种图像生成任务方面,DDPM(DDPM)的概率模型(DDPM)取得了显著的成功。最近关于语义图像合成的工作主要遵循基于 gAN 的方法,这可能导致生成图像的质量不尽人意或多样性。在本文件中,我们提议了一个基于DDPM的新框架,用于语义图像合成。与以前的有条件传播模型不同,它直接将语义布局和噪音图像作为输入U-Net结构的输入材料,而U-Net结构中的信息可能没有充分利用输入语义遮罩、我们框架的语义布局和噪音图像中的信息。它为U-Net结构的编码者提供了噪音图像,而语义布局则可能导致多层空间适应性正常操作者对解密。为了进一步提高语义图像合成的生成质量和语义解释性,我们引入了分类自由指导抽样战略,承认了无条件的模型的分数。在三个基准数据集上进行广泛的实验,展示了我们拟议的方法的有效性。