Recent progress in NeRF-based GANs has introduced a number of approaches for high-resolution and high-fidelity generative modeling of human heads with a possibility for novel view rendering. At the same time, one must solve an inverse problem to be able to re-render or modify an existing image or video. Despite the success of universal optimization-based methods for 2D GAN inversion, those, applied to 3D GANs, may fail to produce 3D-consistent renderings. Fast encoder-based techniques, such as those developed for StyleGAN, may also be less appealing due to the lack of identity preservation. In our work, we introduce a real-time method that bridges the gap between the two approaches by directly utilizing the tri-plane representation introduced for EG3D generative model. In particular, we build upon a feed-forward convolutional encoder for the latent code and extend it with a fully-convolutional predictor of tri-plane numerical offsets. As shown in our work, the renderings are similar in quality to optimization-based techniques and significantly outperform the baselines for novel view. As we empirically prove, this is a consequence of directly operating in the tri-plane space, not in the GAN parameter space, while making use of an encoder-based trainable approach.
翻译:近期NeRF-based GANs的进展引入了多种高分辨率和高保真度的人头部分的生成模型,并具有新颖视角渲染的可能性。同时,必须解决一个反问题,才能重新渲染或修改现有的图像或视频。尽管2D GAN反演的通用优化方法取得了成功,但是应用于3D GANs时,可能无法产生3D-consistent渲染。针对StyleGAN等快速基于编码器的技术,也可能因缺乏身份保护而不太有吸引力。在我们的工作中,我们引入了一种实时方法,通过直接利用EG3D生成模型引入的三平面表示方法来弥合两种方法之间的差距。特别地,我们基于一个前馈卷积编码器的潜在代码,并延伸了一个完全卷积的三平面数值偏移预测器。正如我们的工作所展示的那样,渲染的质量与基于优化方法的技术相似,并且显著优于新视角的基线。正如我们所证明的那样,这是直接在三平面空间而不是GAN参数空间进行操作的结果,同时利用了基于编码器的可训练方法。