Generative models have emerged as an essential building block for many image synthesis and editing tasks. Recent advances in this field have also enabled high-quality 3D or video content to be generated that exhibits either multi-view or temporal consistency. With our work, we explore 4D generative adversarial networks (GANs) that learn unconditional generation of 3D-aware videos. By combining neural implicit representations with time-aware discriminator, we develop a GAN framework that synthesizes 3D video supervised only with monocular videos. We show that our method learns a rich embedding of decomposable 3D structures and motions that enables new visual effects of spatio-temporal renderings while producing imagery with quality comparable to that of existing 3D or video GANs.
翻译:生成模型已成为许多图像合成和编辑任务的基本构件。该领域最近的进展还使得能够生成高质量3D或视频内容,展示多视角或时间一致性。我们在工作中探索了4D基因对抗网络(GANs),以学习无条件生成3D-觉视频。通过将神经隐含的表达与时间认知歧视者相结合,我们开发了一个GAN框架,将仅用单眼视频监督的3D视频合成。我们展示了我们的方法学习了丰富的可解的3D结构和动作,这些结构和动作能够让spatio-时尚图像产生新的视觉效果,同时产生与现有3D或视频GAN质量相仿的图像。