High quality facial image editing is a challenging problem in the movie post-production industry, requiring a high degree of control and identity preservation. Previous works that attempt to tackle this problem may suffer from the entanglement of facial attributes and the loss of the person's identity. Furthermore, many algorithms are limited to a certain task. To tackle these limitations, we propose to edit facial attributes via the latent space of a StyleGAN generator, by training a dedicated latent transformation network and incorporating explicit disentanglement and identity preservation terms in the loss function. We further introduce a pipeline to generalize our face editing to videos. Our model achieves a disentangled, controllable, and identity-preserving facial attribute editing, even in the challenging case of real (i.e., non-synthetic) images and videos. We conduct extensive experiments on image and video datasets and show that our model outperforms other state-of-the-art methods in visual quality and quantitative evaluation. Source codes are available at https://github.com/InterDigitalInc/latent-transformer.
翻译:高品质面部图像编辑是电影后制作行业的一个棘手问题,需要高度控制和身份保护。以前试图解决这一问题的工作可能因面部特征的缠绕和个人身份的丧失而受到影响。此外,许多算法仅限于某种任务。为了应对这些限制,我们提议通过StyleGAN发电机的潜在空间,通过培训专门的潜伏变换网络,在丢失功能中加入明确的分解和身份保护术语,来编辑面部特征。我们进一步引入一条管道,将我们的脸部编辑与视频相通。我们的模型实现了分解、可控制和身份保留面部属性编辑,即使在真实(即非合成)图像和视频具有挑战性的情况下也是如此。我们进行了关于图像和视频数据集的广泛实验,并显示我们的模型在视觉质量和定量评估方面超越了其他州-州-艺术方法。源码见https://github.com/InDIDigatalInc/laent-transtreforation。