We propose the NeRF-LEBM, a likelihood-based top-down 3D-aware 2D image generative model that incorporates 3D representation via Neural Radiance Fields (NeRF) and 2D imaging process via differentiable volume rendering. The model represents an image as a rendering process from 3D object to 2D image and is conditioned on some latent variables that account for object characteristics and are assumed to follow informative trainable energy-based prior models. We propose two likelihood-based learning frameworks to train the NeRF-LEBM: (i) maximum likelihood estimation with Markov chain Monte Carlo-based inference and (ii) variational inference with the reparameterization trick. We study our models in the scenarios with both known and unknown camera poses. Experiments on several benchmark datasets demonstrate that the NeRF-LEBM can infer 3D object structures from 2D images, generate 2D images with novel views and objects, learn from incomplete 2D images, and learn from 2D images with known or unknown camera poses.
翻译:本文提出NeRF-LEBM,一种基于可能性的自上而下,三维感知,二维图像生成模型。该模型通过神经辐射场(NeRF)表示三维对象,并通过差分体积渲染来表示二维成像过程。该模型将图像表达为从3D对象到2D图像的渲染过程,并以一些潜在变量条件化,这些变量考虑了对象特征,并假定遵循信息丰富的可训练能量先验模型。本文提出了两个基于可能性的学习框架来训练NeRF-LEBM:(i)基于马尔可夫链蒙特卡罗推断的极大似然估计和(ii)变分推断和重参数化技巧。我们研究了在已知和未知相机姿态的情况下应用我们的模型的场景。在几个基准数据集上的实验证明,NeRF-LEBM可以从2D图像推断3D对象结构,生成具有新视角和对象的2D图像,从不完整的2D图像中学习,并从已知或未知相机姿态的2D图像中学习。