Editing and manipulating facial features in videos is an interesting and important field of research with a plethora of applications, ranging from movie post-production and visual effects to realistic avatars for video games and virtual assistants. Our method supports semantic video manipulation based on neural rendering and 3D-based facial expression modelling. We focus on interactive manipulation of the videos by altering and controlling the facial expressions, achieving promising photorealistic results. The proposed method is based on a disentangled representation and estimation of the 3D facial shape and activity, providing the user with intuitive and easy-to-use control of the facial expressions in the input video. We also introduce a user-friendly, interactive AI tool that processes human-readable semantic labels about the desired expression manipulations in specific parts of the input video and synthesizes photorealistic manipulated videos. We achieve that by mapping the emotion labels to points on the Valence-Arousal space (where Valence quantifies how positive or negative is an emotion and Arousal quantifies the power of the emotion activation), which in turn are mapped to disentangled 3D facial expressions through an especially-designed and trained expression decoder network. The paper presents detailed qualitative and quantitative experiments, which demonstrate the effectiveness of our system and the promising results it achieves.
翻译:在视频中编辑和操控面部特征是一个令人感兴趣和重要的研究领域,其应用范围包括电影后制作和视觉效应,以及视频游戏和虚拟助理的现实动画。我们的方法支持基于神经转换和3D面部表达模型的语义视频操作。我们侧重于通过改变和控制面部表达方式,对视频进行互动操作,从而实现充满希望的光现实效果。拟议方法基于3D面部形状和活动的分解表达和估计,为用户提供输入视频中的面部表达方式的直观和易于使用的控制。我们还引入了一个用户友好的互动式AI工具,用于处理在输入视频的具体部分和基于3D面部的面部表达方式所需的语义操纵的可读语义性视频操作。我们通过将情感标签绘制到Valence-Ameromala空间的点(Valence Quarente) 的情感标志和负面的表示方式,向用户提供对输入视频视频视频的直观和易用力控制。我们所绘制的3D面部面部表达方式,通过经过特别设计并展示的定性的图像网络,从而实现清晰的质化和定性结果。