Generating consistent and high-quality images from given texts is essential for visual-language understanding. Although impressive results have been achieved in generating high-quality images, text-image consistency is still a major concern in existing GAN-based methods. Particularly, the most popular metric $R$-precision may not accurately reflect the text-image consistency, often resulting in very misleading semantics in the generated images. Albeit its significance, how to design a better text-image consistency metric surprisingly remains under-explored in the community. In this paper, we make a further step forward to develop a novel CLIP-based metric termed as Semantic Similarity Distance ($SSD$), which is both theoretically founded from a distributional viewpoint and empirically verified on benchmark datasets. Benefiting from the proposed metric, we further design the Parallel Deep Fusion Generative Adversarial Networks (PDF-GAN) that aims at improving text-image consistency by fusing semantic information at different granularities and capturing accurate semantics. Equipped with two novel plug-and-play components: Hard-Negative Sentence Constructor and Semantic Projection, the proposed PDF-GAN can mitigate inconsistent semantics and bridge the text-image semantic gap. A series of experiments show that, as opposed to current state-of-the-art methods, our PDF-GAN can lead to significantly better text-image consistency while maintaining decent image quality on the CUB and COCO datasets.
翻译:从特定文本中产生一致和高质量的图像对于视觉语言理解至关重要。虽然在制作高质量图像方面已经取得了令人印象深刻的成果,但文本图像的一致性仍然是现有GAN方法中的一个主要关切。 特别是,最受欢迎的量度$美元精确度可能无法准确地反映文本图像的一致性,这往往导致生成图像中非常误导的语义性。 尽管其重要性,但如何设计更好的文本图像一致性度指标在社区中却令人惊讶地没有得到充分探讨。在本文件中,我们进一步开发了一种基于CLIP的新型指标,称为“语义相似性距离(SSD$) ”,这在理论上是建立在分布视角基础上并在基准数据集上经过经验验证的。我们从拟议的量度上进一步设计了平行深度变换引引的对立网络(PDF-CO-G-G-G-GAN),目的是通过在不同颗粒度上配置语义信息并获取准确的语义定义。我们用两种新型的插置式指标化的内置(C-Neimal-A-Protical-Dedual-deal-Dedual-deal-deal-deal-deal-degradustral-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-Se-deal-deal-deal-deal-de-de-de-deal-Se-deal-deal-deal-deal-deal-deal-deal-deal-deal-Segal-deal-deal-deal-deal-deal-deal-deal-de-de-de-de-de-de-de-de-de-de-de-de-de-de-deal-de-de-de-de-de-deal-deal-deal-deal-deal-deal-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-deal-deal-deal-deal-de-de-de-de-Sal-de-de-