Despite the significant progress in recent years, very few of the AI-based talking face generation methods attempt to render natural emotions. Moreover, the scope of the methods is majorly limited to the characteristics of the training dataset, hence they fail to generalize to arbitrary unseen faces. In this paper, we propose a one-shot facial geometry-aware emotional talking face generation method that can generalize to arbitrary faces. We propose a graph convolutional neural network that uses speech content feature, along with an independent emotion input to generate emotion and speech-induced motion on facial geometry-aware landmark representation. This representation is further used in our optical flow-guided texture generation network for producing the texture. We propose a two-branch texture generation network, with motion and texture branches designed to consider the motion and texture content independently. Compared to the previous emotion talking face methods, our method can adapt to arbitrary faces captured in-the-wild by fine-tuning with only a single image of the target identity in neutral emotion.
翻译:尽管近些年来取得了显著进展,但以AI为基础的谈话面部生成方法中很少有人试图制造自然情感。此外,这些方法的范围主要限于培训数据集的特性,因此无法概括为任意的无形面孔。在本文中,我们提议了一张可以概括为任意面部的单张面貌几何性感性谈话面部生成方法。我们提议了一个图形革命神经网络,它使用语言内容特征,并辅之以独立的情感输入,在面部几何性标志性代表上产生情感和言语引起的运动。这种表达方式被进一步用于我们的光学流动制导质素生成网络中,用于制作中性情感的目标身份。我们提议了一个两层结构的生成网络,配有独立的运动和纹理分支来考虑运动和纹理内容。与先前的情感面部位方法相比,我们的方法可以通过微调,只用中性情感的目标身份的单一图像来调整,来适应在虚张的状态下捕捉取的任意面部面部。