While recent research has progressively overcome the low-resolution constraint of one-shot face video re-enactment with the help of StyleGAN's high-fidelity portrait generation, these approaches rely on at least one of the following: explicit 2D/3D priors, optical flow based warping as motion descriptors, off-the-shelf encoders, etc., which constrain their performance (e.g., inconsistent predictions, inability to capture fine facial details and accessories, poor generalization, artifacts). We propose an end-to-end framework for simultaneously supporting face attribute edits, facial motions and deformations, and facial identity control for video generation. It employs a hybrid latent-space that encodes a given frame into a pair of latents: Identity latent, $\mathcal{W}_{ID}$, and Facial deformation latent, $\mathcal{S}_F$, that respectively reside in the $W+$ and $SS$ spaces of StyleGAN2. Thereby, incorporating the impressive editability-distortion trade-off of $W+$ and the high disentanglement properties of $SS$. These hybrid latents employ the StyleGAN2 generator to achieve high-fidelity face video re-enactment at $1024^2$. Furthermore, the model supports the generation of realistic re-enactment videos with other latent-based semantic edits (e.g., beard, age, make-up, etc.). Qualitative and quantitative analyses performed against state-of-the-art methods demonstrate the superiority of the proposed approach.
翻译:虽然最近的研究逐渐克服了在StyleGAN高纤维化肖像制作的帮助下重新制作一张脸部视频的低分辨率限制,但这些方法至少依靠以下一种方法:清晰的 2D/3D 前置、光流扭曲作为运动描述器、现成编码器等,限制其性能(例如,预测不一致、无法捕捉美面部细节和配件、一般化程度欠佳、手工制品等),同时支持面部属性编辑、面部动作和变形以及图像生成面部身份控制的端到端框架框架框架,这些方法至少依靠以下一种方法:清晰的2D/3D前科前科、以光流为基础将一个给定框架编码成一对影体:身份潜伏、美元caladal{WQ}美元和变形潜变形潜变潜变潜变潜化潜影体(例如,预测不一致的预测、无法捕捉到精细面部和StyleGAN2的缩缩面部位空间),因此,我们建议一个端到端端端端框架化的面面面面面面面面部贸易(W+$美元)和面面部变色变形分析。