Generating consistent and high-quality images from given texts is essential for visual-language understanding. Although impressive results have been achieved in generating high-quality images, text-image consistency is still a major concern in existing GAN-based methods. Particularly, the most popular metric $R$-precision may not accurately reflect the text-image consistency, often resulting in very misleading semantics in the generated images. Albeit its significance, how to design a better text-image consistency metric surprisingly remains under-explored in the community. In this paper, we make a further step forward to develop a novel CLIP-based metric termed as Semantic Similarity Distance (SSD), which is both theoretically founded from a distributional viewpoint and empirically verified on benchmark datasets. Benefiting from the proposed metric, we further design the Parallel Deep Fusion Generative Adversarial Networks (PDF-GAN), which can fuse semantic information at different granularities and capture accurate semantics. Equipped with two novel plug-and-play components: Hard-Negative Sentence Constructor and Semantic Projection, the proposed PDF-GAN can mitigate inconsistent semantics and bridge the text-image semantic gap. A series of experiments show that, as opposed to current state-of-the-art methods, our PDF-GAN can lead to significantly better text-image consistency while maintaining decent image quality on the CUB and COCO datasets.
翻译:从特定文本中产生一致和高质量的图像对于视觉语言理解至关重要。虽然在制作高质量图像方面已经取得了令人印象深刻的成果,但文本图像的一致性仍然是现有基于GAN的方法中的一个主要关切。特别是,最受欢迎的量度$美元精确度可能无法准确地反映文本图像的一致性,这往往导致生成图像中非常误导的语义。尽管其重要性,但如何设计更好的文本图像一致性度量在社区中却令人惊讶地得不到充分探讨。在本文中,我们进一步开发了一个新的基于 CLIP 的称为 Smantic 质量的CSD(SD) 的新型 CLIP 指标,该指标在理论上以分布式视角为基础,并在基准数据集上经过经验性核实。我们从拟议的量度中进一步设计了平行深层变异变异反网络(PDF-GGAN ),它能够将不同颗粒性的语义性信息融合起来,并获取准确的语义。在本文件中,我们用两种新型的插件组合: 硬调的CLIP-C-C-laction Condition Protical Protical Prodistration Providual Providustration Produstration Providustration subal Produstration subal suble sual sual subal subal subal subal subal subal subal subal subal subal subal sual subal subal subilence subal subal subal subal subal subal subil subal subal subal subal —— subal subal subal subal subal subal subal subal subal subal subal subal subal subal subal —— su su su su su su su su su subal subal subdal su su subal su subal