In this paper, we introduce a novel deep learning method for photo-realistic manipulation of the emotional state of actors in "in-the-wild" videos. The proposed method is based on a parametric 3D face representation of the actor in the input scene that offers a reliable disentanglement of the facial identity from the head pose and facial expressions. It then uses a novel deep domain translation framework that alters the facial expressions in a consistent and plausible manner, taking into account their dynamics. Finally, the altered facial expressions are used to photo-realistically manipulate the facial region in the input scene based on an especially-designed neural face renderer. To the best of our knowledge, our method is the first to be capable of controlling the actor's facial expressions by even using as a sole input the semantic labels of the manipulated emotions, while at the same time preserving the speech-related lip movements. We conduct extensive qualitative and quantitative evaluations and comparisons, which demonstrate the effectiveness of our approach and the especially promising results that we obtain. Our method opens a plethora of new possibilities for useful applications of neural rendering technologies, ranging from movie post-production and video games to photo-realistic affective avatars.
翻译:在本文中,我们引入了一种新型深层次的学习方法,在“现场”视频中,对演员的情绪状态进行摄影现实化操纵。建议的方法是基于输入场的演员面部3D表示的参数3D表情,从头部面部和面部表情中可靠地分解面部身份和面部表情。然后,我们用一个新的深层次翻译框架,以一致和合理的方式改变面部表情,同时考虑到其动态。最后,改变面部表情用于在特别设计的神经面部铸造器的基础上,对输入场的面部区域进行照片现实化操纵。根据我们的知识,我们的方法是第一个能够控制演员面部表情的参数,甚至将操纵情绪的语义标志作为唯一的输入,同时保持与语音有关的嘴部运动。我们进行了广泛的定性和定量评估和比较,以展示我们的方法的有效性和我们所取得的特别有希望的结果。我们的方法为神经部面部转换技术的有用应用提供了大量新的可能性,从电影后制片和视频游戏到摄影真实性游戏。