Facial video re-targeting is a challenging problem aiming to modify the facial attributes of a target subject in a seamless manner by a driving monocular sequence. We leverage the 3D geometry of faces and Generative Adversarial Networks (GANs) to design a novel deep learning architecture for the task of facial and head reenactment. Our method is different to purely 3D model-based approaches, or recent image-based methods that use Deep Convolutional Neural Networks (DCNNs) to generate individual frames. We manage to capture the complex non-rigid facial motion from the driving monocular performances and synthesise temporally consistent videos, with the aid of a sequential Generator and an ad-hoc Dynamics Discriminator network. We conduct a comprehensive set of quantitative and qualitative tests and demonstrate experimentally that our proposed method can successfully transfer facial expressions, head pose and eye gaze from a source video to a target subject, in a photo-realistic and faithful fashion, better than other state-of-the-art methods. Most importantly, our system performs end-to-end reenactment in nearly real-time speed (18 fps).
翻译:面部影像重新定位是一个具有挑战性的问题,目的是通过一个驱动单眼序列,无缝地修改目标对象的面部特征。我们利用面孔和基因反反射网络的三维几何结构来设计用于面部和头部再反应任务的新型深层次学习结构。我们的方法不同于纯粹基于3D模式的方法,或最近使用深革命神经网络(深革命神经网络)生成单个框架的图像方法。我们设法从驱动单眼性能和时间一致合成视频中捕捉到复杂的非硬性面部运动,并借助相继生成器和自动动态反射网络。我们进行了一套全面的定量和定性测试,并实验性地证明我们拟议的方法能够成功地将面部表达、头部和眼视从源视频转移到目标对象,以摄影现实和忠实的方式,优于其他最先进的方法。最重要的是,我们的系统在近实时速度(18个速度)下进行端至端再反应。