One-shot video-driven talking face generation aims at producing a synthetic talking video by transferring the facial motion from a video to an arbitrary portrait image. Head pose and facial expression are always entangled in facial motion and transferred simultaneously. However, the entanglement sets up a barrier for these methods to be used in video portrait editing directly, where it may require to modify the expression only while maintaining the pose unchanged. One challenge of decoupling pose and expression is the lack of paired data, such as the same pose but different expressions. Only a few methods attempt to tackle this challenge with the feat of 3D Morphable Models (3DMMs) for explicit disentanglement. But 3DMMs are not accurate enough to capture facial details due to the limited number of Blenshapes, which has side effects on motion transfer. In this paper, we introduce a novel self-supervised disentanglement framework to decouple pose and expression without 3DMMs and paired data, which consists of a motion editing module, a pose generator, and an expression generator. The editing module projects faces into a latent space where pose motion and expression motion can be disentangled, and the pose or expression transfer can be performed in the latent space conveniently via addition. The two generators render the modified latent codes to images, respectively. Moreover, to guarantee the disentanglement, we propose a bidirectional cyclic training strategy with well-designed constraints. Evaluations demonstrate our method can control pose or expression independently and be used for general video editing.
翻译:由一拍视频驱动的谈话面容生成的目的是通过将面部运动从视频转移到任意的肖像图像来制作合成谈话视频。 头部和面部表达总是被面部运动缠绕在一起, 并同时传输。 但是, 缠绕为这些方法直接用于视频肖像编辑设置了障碍, 可能需要在保持面貌不变的情况下修改表达方式。 将面部和表达方式脱钩的一个挑战是缺少配对数据, 例如相同的面部和不同的表达方式。 只有几种方法试图用3D可塑模型( 3DMMs)的剧情来应对这一挑战, 以明确解析。 但是 3DMS 的面部和面部表达方式总是不准确, 无法捕捉面部细节, 因为布伦沙佩斯的数量有限, 这对运动的切换产生了副作用。 在本文中,我们引入了一个自导的分解的分解框架, 没有3DMS和配对的数据, 包括运动编辑模块、 配置发电机、 表达式模块项目在隐蔽空间表达方式上, 能够通过透明地展示, 进行隐性分析, 度分析, 度的变动的调整, 和变形的变形的演化的演制的演制,, 使我们的变的变的演制成的演制的演制的变的演制成的演制,, 的演制的演制成的演制成的演制,,,,可以分别演制成的演制的演制成的变的演制成的演制成的演制成的演制。