Recently audio-driven talking face video generation has attracted considerable attention. However, very few researches address the issue of emotional editing of these talking face videos with continuously controllable expressions, which is a strong demand in the industry. The challenge is that speech-related expressions and emotion-related expressions are often highly coupled. Meanwhile, traditional image-to-image translation methods cannot work well in our application due to the coupling of expressions with other attributes such as poses, i.e., translating the expression of the character in each frame may simultaneously change the head pose due to the bias of the training data distribution. In this paper, we propose a high-quality facial expression editing method for talking face videos, allowing the user to control the target emotion in the edited video continuously. We present a new perspective for this task as a special case of motion information editing, where we use a 3DMM to capture major facial movements and an associated texture map modeled by a StyleGAN to capture appearance details. Both representations (3DMM and texture map) contain emotional information and can be continuously modified by neural networks and easily smoothed by averaging in coefficient/latent spaces, making our method simple yet effective. We also introduce a mouth shape preservation loss to control the trade-off between lip synchronization and the degree of exaggeration of the edited expression. Extensive experiments and a user study show that our method achieves state-of-the-art performance across various evaluation criteria.
翻译:最近由声频驱动的谈话面部视频生成引起了相当多的关注。然而,很少有研究研究涉及这些以持续可控的表达方式持续对面部视频进行情感编辑的问题,这是业界的强烈需求。挑战在于与语言有关的表达方式和情感有关的表达方式往往高度交织。与此同时,传统的图像到图像翻译方法在应用中无法很好地发挥作用,因为将表达方式与表象等其他属性(即翻译每个框架中的字符表达形式)混合在一起,例如,翻译每个框架中的字符表达方式可能同时因培训数据分布的偏差而改变头部。在本文件中,我们建议为谈话面部视频提供高质量的面部编辑方式编辑方法,使用户能够持续控制编辑视频中的目标情感。我们提出了这项任务的新视角,作为运动信息编辑的一个特殊案例,我们使用3DMMM来捕捉主要面部动和由StyleGAN制成的相关文本地图来捕捉外观细节。两种表达方式都包含情感信息,并且可以通过神经网络不断修改,并且通过平均的系数/延缩空间来方便地调整。我们用一种简单的口腔分析方法,从而展示了我们的工具,从而展示了一种简化了我们的贸易管理方式,从而展示了一种演示。