We present a versatile model, FaceAnime, for various video generation tasks from still images. Video generation from a single face image is an interesting problem and usually tackled by utilizing Generative Adversarial Networks (GANs) to integrate information from the input face image and a sequence of sparse facial landmarks. However, the generated face images usually suffer from quality loss, image distortion, identity change, and expression mismatching due to the weak representation capacity of the facial landmarks. In this paper, we propose to "imagine" a face video from a single face image according to the reconstructed 3D face dynamics, aiming to generate a realistic and identity-preserving face video, with precisely predicted pose and facial expression. The 3D dynamics reveal changes of the facial expression and motion, and can serve as a strong prior knowledge for guiding highly realistic face video generation. In particular, we explore face video prediction and exploit a well-designed 3D dynamic prediction network to predict a 3D dynamic sequence for a single face image. The 3D dynamics are then further rendered by the sparse texture mapping algorithm to recover structural details and sparse textures for generating face frames. Our model is versatile for various AR/VR and entertainment applications, such as face video retargeting and face video prediction. Superior experimental results have well demonstrated its effectiveness in generating high-fidelity, identity-preserving, and visually pleasant face video clips from a single source face image.
翻译:我们展示了一个多功能模型FaceAnime, 用于从静态图像中生成各种视频任务。 从一张面部图像生成视频是一个有趣的问题,通常通过利用General Adversarial Networks(GANs)整合来自输入面部图像的信息,以及一系列分散的面部标志。然而,生成的面部图像通常会因为面部标志显示能力薄弱而遭受质量损失、图像扭曲、身份变化和表达不匹配。在本文中,我们提议根据重建的 3D 面部动态,从一张面部图像中“想象”一张面部图像生成一张面部视频,目的是生成一个现实和身份保存身份的图像。3D 3D 动态显示一个真实的面部视频视频视频视频,目的是从面部面部图像中恢复结构细节,并产生清晰的文本。 我们的图像模型展示了各种AAR/V图像和图像的高级图像。