What is really needed to make an existing 2D GAN 3D-aware? To answer this question, we modify a classical GAN, i.e., StyleGANv2, as little as possible. We find that only two modifications are absolutely necessary: 1) a multiplane image style generator branch which produces a set of alpha maps conditioned on their depth; 2) a pose-conditioned discriminator. We refer to the generated output as a 'generative multiplane image' (GMPI) and emphasize that its renderings are not only high-quality but also guaranteed to be view-consistent, which makes GMPIs different from many prior works. Importantly, the number of alpha maps can be dynamically adjusted and can differ between training and inference, alleviating memory concerns and enabling fast training of GMPIs in less than half a day at a resolution of $1024^2$. Our findings are consistent across three challenging and common high-resolution datasets, including FFHQ, AFHQv2, and MetFaces.
翻译:为了让现有 2D GAN 3D-aware 成为现有 2D GAN 3D-aware 真正需要的是什么? 为了回答这个问题, 我们尽可能少修改古典 GAN, 即 StyleGANv2 。 我们发现只有两处绝对需要修改:(1) 多平板图像风格生成分支, 产生一套以深度为条件的字母地图;(2) 装设条件的区分器。 我们把生成的输出称为“ 遗传多平面图像 ” ( GMPI), 我们强调, 其内容不仅高质量, 而且还保证能与视觉一致, 这使得 GMAPI 与以前的许多作品不同。 重要的是, 阿尔法地图的数量可以动态调整, 并且可以在培训和推断、 减轻记忆问题和在不到半天的时间里以 1024+2 的分辨率对 GMPIs 进行快速培训之间, 我们的发现在三个挑战性和常见的高分辨率数据集之间是一致的, 包括 FFHQ、 AFHQ2 和MetFace 。