We present Free-HeadGAN, a person-generic neural talking head synthesis system. We show that modeling faces with sparse 3D facial landmarks are sufficient for achieving state-of-the-art generative performance, without relying on strong statistical priors of the face, such as 3D Morphable Models. Apart from 3D pose and facial expressions, our method is capable of fully transferring the eye gaze, from a driving actor to a source identity. Our complete pipeline consists of three components: a canonical 3D key-point estimator that regresses 3D pose and expression-related deformations, a gaze estimation network and a generator that is built upon the architecture of HeadGAN. We further experiment with an extension of our generator to accommodate few-shot learning using an attention mechanism, in case more than one source images are available. Compared to the latest models for reenactment and motion transfer, our system achieves higher photo-realism combined with superior identity preservation, while offering explicit gaze control.
翻译:我们展示了自由的头部GAN, 是一个人脑神经谈话头部合成系统。 我们显示, 以稀疏的 3D 面部标志为模范的面孔建模, 足以实现最先进的基因性能, 不需要依靠3D 摩托模型等强有力的表面统计前科。 除了3D 外表和面部表情表达, 我们的方法可以将眼视完全从驱动器向源身份转移。 我们的完整管道由三个部分组成: 一个3D 型立方键点测量仪, 反射 3D 形和与表达有关的变形, 一个凝视估计网络和一个在HeadGAN 结构上建构的生成器。 我们进一步实验我们的发电机扩展, 以便利用关注机制来适应微小的学习, 以防出现超过一个源图像。 与最新的重现和运动传输模型相比, 我们的系统实现了更高的摄影现实主义, 与高级身份保护相结合, 同时提供明确的凝视控。