AI-driven image generation has improved significantly in recent years. Generative adversarial networks (GANs), like StyleGAN, are able to generate high-quality realistic data and have artistic control over the output, as well. In this work, we present StyleT2F, a method of controlling the output of StyleGAN2 using text, in order to be able to generate a detailed human face from textual description. We utilize StyleGAN's latent space to manipulate different facial features and conditionally sample the required latent code, which embeds the facial features mentioned in the input text. Our method proves to capture the required features correctly and shows consistency between the input text and the output images. Moreover, our method guarantees disentanglement on manipulating a wide range of facial features that sufficiently describes a human face.
翻译:近年来,AI驱动的图像生成有了显著的改善。像 StyleGAN 那样的生成对抗性网络(GANs)能够生成高质量的现实数据,并拥有对输出的艺术控制。在这项工作中,我们展示了StyleT2F,这是用文字控制StyleGAN2输出的一种方法,以便从文字描述中产生出详细的人类面貌。我们利用StyleGAN的潜伏空间来操纵不同的面部特征,并有条件地样本所需的潜伏代码,这些代码嵌入输入文本中提到的面部特征。我们的方法证明正确捕捉了所需的特征,并显示了输入文本和输出图像之间的一致性。此外,我们的方法保证了在操纵能够充分描述人类面貌的范围广泛的面部特征时不纠缠。