We introduce VIVE3D, a novel approach that extends the capabilities of image-based 3D GANs to video editing and is able to represent the input video in an identity-preserving and temporally consistent way. We propose two new building blocks. First, we introduce a novel GAN inversion technique specifically tailored to 3D GANs by jointly embedding multiple frames and optimizing for the camera parameters. Second, besides traditional semantic face edits (e.g. for age and expression), we are the first to demonstrate edits that show novel views of the head enabled by the inherent properties of 3D GANs and our optical flow-guided compositing technique to combine the head with the background video. Our experiments demonstrate that VIVE3D generates high-fidelity face edits at consistent quality from a range of camera viewpoints which are composited with the original video in a temporally and spatially consistent manner.
翻译:我们引入了 VIVE3D,这是一种新颖的方法,扩展了基于图像的 3D GANs 的能力,用于视频编辑,能够以保留身份和时间一致性的方式表示输入视频。我们提出了两个新的构建模块。首先,我们引入了一种特定于 3D GANs 的新型 GAN 反演技术,通过联合嵌入多个帧并优化相机参数来实现。其次,除了传统的语义面部编辑(例如用于年龄和表情),我们还首次展示了通过 3D GANs 的固有属性并结合我们的光流引导合成技术与背景视频一起合成头部的新视图的编辑。我们的实验证明,VIVE3D 从各种角度生成高保真度的面部编辑,并以时间和空间一致的方式与原始视频结合。