Face image animation from a single image has achieved remarkable progress. However, it remains challenging when only sparse landmarks are available as the driving signal. Given a source face image and a sequence of sparse face landmarks, our goal is to generate a video of the face imitating the motion of landmarks. We develop an efficient and effective method for motion transfer from sparse landmarks to the face image. We then combine global and local motion estimation in a unified model to faithfully transfer the motion. The model can learn to segment the moving foreground from the background and generate not only global motion, such as rotation and translation of the face, but also subtle local motion such as the gaze change. We further improve face landmark detection on videos. With temporally better aligned landmark sequences for training, our method can generate temporally coherent videos with higher visual quality. Experiments suggest we achieve results comparable to the state-of-the-art image driven method on the same identity testing and better results on cross identity testing.
翻译:单个图像中的面部动画已经取得了显著的进展。 然而, 当只有稀少的地标作为驱动信号时, 它仍然具有挑战性。 以源面图像和一连串分散的面部地标为信号, 我们的目标是制作一个模仿地标运动的图像视频。 我们开发了一个高效和有效的方法, 将运动从稀疏的地标转移到脸图像上。 然后我们将全球和地方运动的估算合并成一个统一的模型, 以忠实地传输运动。 该模型可以学习将移动的前台从背景分割开来, 并且不仅产生全球运动, 比如旋转和翻译面部, 而且还产生微妙的本地运动, 比如凝视变化。 我们进一步改进视频上的脸部地标检测。 随着时间上的更一致的地标序列用于培训, 我们的方法可以产生时间上更一致的视频, 并且视觉质量更高。 实验显示, 我们取得的结果可以与同一身份测试上最先进的图像驱动方法相比, 以及交叉身份测试的结果更好。