We propose a novel approach, MUSE, to illustrate textual attributes visually via portrait generation. MUSE takes a set of attributes written in text, in addition to facial features extracted from a photo of the subject as input. We propose 11 attribute types to represent inspirations from a subject's profile, emotion, story, and environment. We propose a novel stacked neural network architecture by extending an image-to-image generative model to accept textual attributes. Experiments show that our approach significantly outperforms several state-of-the-art methods without using textual attributes, with Inception Score score increased by 6% and Fr\'echet Inception Distance (FID) score decreased by 11%, respectively. We also propose a new attribute reconstruction metric to evaluate whether the generated portraits preserve the subject's attributes. Experiments show that our approach can accurately illustrate 78% textual attributes, which also help MUSE capture the subject in a more creative and expressive way.
翻译:我们建议一种新颖的方法,即MUSE,通过肖像生成来说明文本属性。MUSE除了从主题照片中提取的面部特征外,还采用文本中写成的一套属性。我们建议了11个属性类型,以代表一个对象的剖面图、情感、故事和环境的灵感。我们建议了一个新颖的堆叠神经网络结构,将图像到图像的基因化模型扩展至接受文本属性。实验表明,我们的方法大大优于几种最先进的方法,而没有使用文本属性,其感知分分分别增加了6%和Fr\'echet 感知距离(FID)分减少了11%。我们还提出了一个新的属性重建指标,以评价所生成的肖像是否保存了主题属性。实验表明,我们的方法可以准确地说明78%的文本属性,这也帮助MUSE以更有创意和表达性的方式捕捉对象。