Conditional generative adversarial networks (cGANs) target at synthesizing diverse images given the input conditions and latent codes, but unfortunately, they usually suffer from the issue of mode collapse. To solve this issue, previous works mainly focused on encouraging the correlation between the latent codes and their generated images, while ignoring the relations between images generated from various latent codes. The recent MSGAN tried to encourage the diversity of the generated image but only considers "negative" relations between the image pairs. In this paper, we propose a novel DivCo framework to properly constrain both "positive" and "negative" relations between the generated images specified in the latent space. To the best of our knowledge, this is the first attempt to use contrastive learning for diverse conditional image synthesis. A novel latent-augmented contrastive loss is introduced, which encourages images generated from adjacent latent codes to be similar and those generated from distinct latent codes to be dissimilar. The proposed latent-augmented contrastive loss is well compatible with various cGAN architectures. Extensive experiments demonstrate that the proposed DivCo can produce more diverse images than state-of-the-art methods without sacrificing visual quality in multiple unpaired and paired image generation tasks.
翻译:在输入条件和潜在代码的情况下,这些图像通常会受到模式崩溃问题的影响。为了解决这个问题,先前的工作主要侧重于鼓励潜在代码与其生成图像之间的相互关系,同时忽视各种潜在代码产生的图像之间的关系。最近的MSGAN试图鼓励生成图像的多样性,但只考虑图像配对之间的“负性”关系。在本文件中,我们提出了一个新的DivCo框架,以适当限制在潜空中指定的生成图像之间的“正性”和“负性”关系。据我们所知,这是首次尝试利用对比性学习来进行不同的有条件图像合成。引入了一个新的潜在潜在放大反差损失,鼓励相邻潜在代码生成的图像相似,而不同潜在代码生成的图像则不同。拟议的潜性反差损失与各种 CGAN 结构非常相容。广泛的实验表明,拟议的DivCo 可以在不采用州立图像生成质量的多变换方法的情况下产生更多不同的图像。