Generating photo-realistic images from a text description is a challenging problem in computer vision. Previous works have shown promising performance to generate synthetic images conditional on text by Generative Adversarial Networks (GANs). In this paper, we focus on the category-consistent and relativistic diverse constraints to optimize the diversity of synthetic images. Based on those constraints, a category-consistent and relativistic diverse conditional GAN (CRD-CGAN) is proposed to synthesize $K$ photo-realistic images simultaneously. We use the attention loss and diversity loss to improve the sensitivity of the GAN to word attention and noises. Then, we employ the relativistic conditional loss to estimate the probability of relatively real or fake for synthetic images, which can improve the performance of basic conditional loss. Finally, we introduce a category-consistent loss to alleviate the over-category issues between K synthetic images. We evaluate our approach using the Birds-200-2011, Oxford-102 flower and MSCOCO 2014 datasets, and the extensive experiments demonstrate superiority of the proposed method in comparison with state-of-the-art methods in terms of photorealistic and diversity of the generated synthetic images.
翻译:从文本描述中产生摄影现实图像是计算机视觉中一个具有挑战性的问题。以前的工作显示,以Generation Aversarial Networks(GANs)的文字为条件,制作合成图像的合成图像,具有很有希望的性能。在本文中,我们侧重于优化合成图像多样性的分类一致和相对的多种限制。根据这些制约因素,建议同时合成一个类别一致和相对的多元的有条件GAN(CRD-CGAN),以合成图像为一体。我们利用注意力损失和多样性损失来提高GAN对字眼和噪音的敏感度。然后,我们使用相对的有条件损失来估计合成图像相对真实或假的概率,这可以改善基本有条件损失的性能。最后,我们引入了一种类别一致损失,以缓解K合成图像之间的超类问题。我们用Birks-200-2011、Oxford-102花和MSCOCO 2014数据集来评估我们的方法。我们使用Birds-200-2011、Oxford-102 和MSCO 2014年的数据集,以及广泛的实验显示拟议方法在与合成图像的合成多样性和合成方法上具有优势。