Most conditional generation tasks expect diverse outputs given a single conditional context. However, conditional generative adversarial networks (cGANs) often focus on the prior conditional information and ignore the input noise vectors, which contribute to the output variations. Recent attempts to resolve the mode collapse issue for cGANs are usually task-specific and computationally expensive. In this work, we propose a simple yet effective regularization term to address the mode collapse issue for cGANs. The proposed method explicitly maximizes the ratio of the distance between generated images with respect to the corresponding latent codes, thus encouraging the generators to explore more minor modes during training. This mode seeking regularization term is readily applicable to various conditional generation tasks without imposing training overhead or modifying the original network structures. We validate the proposed algorithm on three conditional image synthesis tasks including categorical generation, image-to-image translation, and text-to-image synthesis with different baseline models. Both qualitative and quantitative results demonstrate the effectiveness of the proposed regularization method for improving diversity without loss of quality.
翻译:有条件的生成任务在单一的有条件环境下预期有多种产出。然而,有条件的基因对抗网络(cGANs)往往侧重于先前的有条件信息,忽视输入的噪音矢量,从而导致产出的变化。最近为解决cGANs模式崩溃问题所做的努力通常都是针对具体任务,而且计算成本很高。在这项工作中,我们提出了一个简单而有效的正规化术语,以解决cGANs模式崩溃问题。拟议方法明确将生成的图像与相应的潜在代码之间的距离比例最大化,从而鼓励生成者在培训期间探索更多的次要模式。这一模式寻求正规化术语很容易适用于各种有条件生成任务,而不会强加培训间接费用或修改原始网络结构。我们验证了三种有条件的图像合成任务的拟议算法,包括绝对生成、图像到图像翻译和文本到图像的合成以及不同的基线模型。定性和定量结果都表明拟议正规化方法在不损失质量的情况下提高多样性的有效性。