Recent advances in generative adversarial networks (GANs) have demonstrated the capabilities of generating stunning photo-realistic portrait images. While some prior works have applied such image GANs to unconditional 2D portrait video generation and static 3D portrait synthesis, there are few works successfully extending GANs for generating 3D-aware portrait videos. In this work, we propose PV3D, the first generative framework that can synthesize multi-view consistent portrait videos. Specifically, our method extends the recent static 3D-aware image GAN to the video domain by generalizing the 3D implicit neural representation to model the spatio-temporal space. To introduce motion dynamics to the generation process, we develop a motion generator by stacking multiple motion layers to generate motion features via modulated convolution. To alleviate motion ambiguities caused by camera/human motions, we propose a simple yet effective camera condition strategy for PV3D, enabling both temporal and multi-view consistent video generation. Moreover, PV3D introduces two discriminators for regularizing the spatial and temporal domains to ensure the plausibility of the generated portrait videos. These elaborated designs enable PV3D to generate 3D-aware motion-plausible portrait videos with high-quality appearance and geometry, significantly outperforming prior works. As a result, PV3D is able to support many downstream applications such as animating static portraits and view-consistent video motion editing. Code and models are released at https://showlab.github.io/pv3d.
翻译:基因对抗网络(GANs)的近期进步展示了生成惊人的摄影现实肖像图像的能力。 虽然一些先前的作品应用了这样的图像 GANs 来无条件的 2D 肖像视频生成和静态的 3D 肖像合成, 但成功扩展 GANs 生成 3D 肖像视频的作品却很少。 在这项工作中, 我们提议了 PV3D, 这是第一个能够综合多视图一致的肖像视频的基因化框架。 具体地说, 我们的方法通过将 3D 隐含的神经图象 推广到视频域, 将 3D 隐含的 GAN 推广到视频域, 模拟空间- 时空空间空间。 为了向生成过程引入运动动态动态动态动态动态, 我们开发了一个运动生成多个运动层, 通过调制调调调调的组合组合组合组合生成运动特征。 为了减轻摄影机/人类动作造成的动作模糊性, 我们为PV3D提出一个简单而有效的摄像策略, 使得时间和多视图相一致生成。 此外, PV3D 引入了两种空间和时空域规范化模型, 以确保视频模型, 的模型是确保所生成的图像图像的快速图像的快速图像的快速图像的快速展示3 。 这些设计使得一个快速图像的图像生成的图像生成的图像生成的图像的图像生成的图像生成成为了一种高性版本3 。