Although significant progress has been made to audio-driven talking face generation, existing methods either neglect facial emotion or cannot be applied to arbitrary subjects. In this paper, we propose the Emotion-Aware Motion Model (EAMM) to generate one-shot emotional talking faces by involving an emotion source video. Specifically, we first propose an Audio2Facial-Dynamics module, which renders talking faces from audio-driven unsupervised zero- and first-order key-points motion. Then through exploring the motion model's properties, we further propose an Implicit Emotion Displacement Learner to represent emotion-related facial dynamics as linearly additive displacements to the previously acquired motion representations. Comprehensive experiments demonstrate that by incorporating the results from both modules, our method can generate satisfactory talking face results on arbitrary subjects with realistic emotion patterns.
翻译:尽管在音频驱动对话面部生成方面取得了显著进展,但现有的方法要么忽视面部情绪,要么不能应用于任意题材。在本文中,我们提出情感-心智运动模型(EAMM),通过包含情感源视频来生成一张照片的情感交谈面孔。具体地说,我们首先提议一个音频2Facial-动能模块,通过音频驱动的、不受监督的零级和一阶式关键点动作来生成声音面孔。然后,通过探索该动议模型的属性,我们进一步建议一个隐含情感情感的情感流离失所学习器,将情感方面的面部动态作为线性添加剂,转移到先前获得的运动演示中。全面实验表明,通过纳入这两个模块的结果,我们的方法可以产生关于任意主题的满意的面部对话结果,并具有现实的情感模式。