We propose a generative framework, FaceLit, capable of generating a 3D face that can be rendered at various user-defined lighting conditions and views, learned purely from 2D images in-the-wild without any manual annotation. Unlike existing works that require careful capture setup or human labor, we rely on off-the-shelf pose and illumination estimators. With these estimates, we incorporate the Phong reflectance model in the neural volume rendering framework. Our model learns to generate shape and material properties of a face such that, when rendered according to the natural statistics of pose and illumination, produces photorealistic face images with multiview 3D and illumination consistency. Our method enables photorealistic generation of faces with explicit illumination and view controls on multiple datasets - FFHQ, MetFaces and CelebA-HQ. We show state-of-the-art photorealism among 3D aware GANs on FFHQ dataset achieving an FID score of 3.5.
翻译:我们提出了一个生成框架 FaceLit,能够生成一个三维人脸,可以在不同的用户定义的光照条件和视角下进行渲染,并且完全是从展示在野外的二维图片中学习的,无需任何手动注释。与现有的需要仔细捕捉设置或人工劳动的方法不同,我们依靠现成的姿态和光照估计器。我们在神经体积渲染框架中加入了 Phong 反射模型。我们的模型学习生成人脸的形状和材料属性,以便根据姿态和光照的自然统计数据进行渲染,产生具有多视角三维和照明一致性的逼真面部图像。我们的方法在多个数据集 - FFHQ,MetFaces 和 CelebA-HQ 上能够实现具有显式照明和视角控制的面部逼真生成。我们在 FFHQ 数据集上展示了三维感知 GAN 中的最新最佳面部逼真性,取得了 3.5 的 FID 分数。