Generative Adversarial Networks (GANs) have revolutionized image synthesis through many applications like face generation, photograph editing, and image super-resolution. Image synthesis using GANs has predominantly been uni-modal, with few approaches that can synthesize images from text or other data modes. Text-to-image synthesis, especially text-to-face synthesis, has promising use cases of robust face-generation from eye witness accounts and augmentation of the reading experience with visual cues. However, only a couple of datasets provide consolidated face data and textual descriptions for text-to-face synthesis. Moreover, these textual annotations are less extensive and descriptive, which reduces the diversity of faces generated from it. This paper empirically proves that increasing the number of facial attributes in each textual description helps GANs generate more diverse and real-looking faces. To prove this, we propose a new methodology that focuses on using structured textual descriptions. We also consolidate a Multi-Attributed and Structured Text-to-face (MAST) dataset consisting of high-quality images with structured textual annotations and make it available to researchers to experiment and build upon. Lastly, we report benchmark Frechet's Inception Distance (FID), Facial Semantic Similarity (FSS), and Facial Semantic Distance (FSD) scores for the MAST dataset.
翻译:生成模拟网络(GANs)通过面部生成、照片编辑和图像超分辨率等多种应用使图像合成革命化。使用GANs的图像合成主要是单式和描述性的,使用GANs的图像合成主要是单式的,几乎没有什么方法能够从文本或其他数据模式中合成图像。文本到图像合成,特别是文本到面部合成,有希望地使用从目击证人账户中生成强有力的面部生成的实例,并增加视觉提示的阅读经验。然而,只有几套数据集为文本到脸部合成提供了合并的面部数据和文字描述。此外,这些文字说明没有那么广泛和描述性,减少了从中生成的面部多样性。这份文件从经验上证明,增加每个文本描述中的面部属性有助于GANs产生更多样化和真实的面部。为了证明这一点,我们提出了一个新的方法,重点是使用结构化的文本描述。我们还整合了一个多属性和结构化的文本到脸部数据集,包括结构化的高质量图像以及结构化的文字说明,并提供给研究人员用于实验和构建长期数据(SeasireFS),最后,我们提出了一个基准(Seacils-Sadrial Statrial)。