Automatic synthesis of faces from visual attributes is an important problem in computer vision and has wide applications in law enforcement and entertainment. With the advent of deep generative convolutional neural networks (CNNs), attempts have been made to synthesize face images from attributes and text descriptions. In this paper, we take a different approach, where we formulate the original problem as a stage-wise learning problem. We first synthesize the facial sketch corresponding to the visual attributes and then we reconstruct the face image based on the synthesized sketch. The proposed Attribute2Sketch2Face framework, which is based on a combination of deep Conditional Variational Autoencoder (CVAE) and Generative Adversarial Networks (GANs), consists of three stages: (1) Synthesis of facial sketch from attributes using a CVAE architecture, (2) Enhancement of coarse sketches to produce sharper sketches using a GAN-based framework, and (3) Synthesis of face from sketch using another GAN-based network. Extensive experiments and comparison with recent methods are performed to verify the effectiveness of the proposed attribute-based three stage face synthesis method.
翻译:视觉特征的面部自动合成是计算机视觉中的一个重要问题,在执法和娱乐中具有广泛的应用。随着深度基因变异神经网络(CNNs)的出现,我们试图从属性和文字描述中合成面部图像。在本文中,我们采取了不同的方法,将原始问题作为阶段性学习问题提出来。我们首先对与视觉属性相对应的面部草图进行综合,然后根据合成的草图重建面部图像。拟议的属性2Sketch2Face框架基于深度条件变异自动计算机(CVAE)和基因变异自动网络(GANs)的结合,由三个阶段组成:(1) 利用CVAE结构对属性的面部草图进行综合,(2) 利用以GAN为基础的框架加强粗糙的草图,(3) 利用另一个以GAN为基础的网络对面部图面部进行合成。进行了广泛的实验,并与最近采用的方法进行了比较,以核实拟议基于属性的3阶段面部合成方法的有效性。