In this work, we address the problem of jointly estimating albedo, normals, depth and 3D spatially-varying lighting from a single image. Most existing methods formulate the task as image-to-image translation, ignoring the 3D properties of the scene. However, indoor scenes contain complex 3D light transport where a 2D representation is insufficient. In this paper, we propose a unified, learning-based inverse rendering framework that formulates 3D spatially-varying lighting. Inspired by classic volume rendering techniques, we propose a novel Volumetric Spherical Gaussian representation for lighting, which parameterizes the exitant radiance of the 3D scene surfaces on a voxel grid. We design a physics based differentiable renderer that utilizes our 3D lighting representation, and formulates the energy-conserving image formation process that enables joint training of all intrinsic properties with the re-rendering constraint. Our model ensures physically correct predictions and avoids the need for ground-truth HDR lighting which is not easily accessible. Experiments show that our method outperforms prior works both quantitatively and qualitatively, and is capable of producing photorealistic results for AR applications such as virtual object insertion even for highly specular objects.
翻译:在这项工作中,我们解决了从单一图像中共同估计反照、正常、深度和3D空间变化照明的问题。大多数现有方法将这项任务作为图像到图像翻译,忽略了场景的3D特性。然而,室内场景包含复杂的3D光传输,而2D表示法不够。在本文中,我们提议了一个统一的、基于学习的反向框架,以开发3D空间变化的照明。在典型的体积制作技术的启发下,我们提出一个新的用于照明的量子体球显示,将3D场景表面表面的退出光亮作为参数。我们设计了一个基于不同的物理成像器,利用我们的3D照明表示法,并设计出一个节能图像形成过程,以便能够用重新显示的制约来对所有内在特性进行联合培训。我们的模型确保了物理正确的预测,并避免了无法轻易获得的地面光学光学光学需要。实验表明,我们的方法比先前的物体的外形体格要强,即使是在数量上和质量上都能够产生摄影效果的虚拟应用。