Recent research has witnessed advances in facial image editing tasks including face swapping and face reenactment. However, these methods are confined to dealing with one specific task at a time. In addition, for video facial editing, previous methods either simply apply transformations frame by frame or utilize multiple frames in a concatenated or iterative fashion, which leads to noticeable visual flickers. In this paper, we propose a unified temporally consistent facial video editing framework termed UniFaceGAN. Based on a 3D reconstruction model and a simple yet efficient dynamic training sample selection mechanism, our framework is designed to handle face swapping and face reenactment simultaneously. To enforce the temporal consistency, a novel 3D temporal loss constraint is introduced based on the barycentric coordinate interpolation. Besides, we propose a region-aware conditional normalization layer to replace the traditional AdaIN or SPADE to synthesize more context-harmonious results. Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.
翻译:最近的研究见证了面部图像编辑任务的进展,包括面部互换和面部重现。然而,这些方法仅限于一次处理一项特定任务。此外,对于视频面部编辑,以往的方法要么只是简单地应用框架的变换框架,要么以组合或迭接的方式使用多框架,从而产生可见的视觉闪烁。在本文中,我们提出了一个统一的时间一致性面部图像编辑框架,称为UnifaceGAN。基于3D重建模型和一个简单而有效的动态培训样本选择机制,我们的框架设计用来同时处理面部互换和面部重现。为了执行时间一致性,根据中心点坐标的相互调和,引入了一个新的3D时间损失限制。此外,我们提议一个区域自觉的有条件正常化层,以取代传统的AdaIN或SPADE来合成更多背景协调的结果。比照最先进的面部图像编辑方法,我们的框架生成视频肖像,更具有摄影现实性和时间性。