We present a method for estimating neural scenes representations of objects given only a single image. The core of our method is the estimation of a geometric scaffold for the object and its use as a guide for the reconstruction of the underlying radiance field. Our formulation is based on a generative process that first maps a latent code to a voxelized shape, and then renders it to an image, with the object appearance being controlled by a second latent code. During inference, we optimize both the latent codes and the networks to fit a test image of a new object. The explicit disentanglement of shape and appearance allows our model to be fine-tuned given a single image. We can then render new views in a geometrically consistent manner and they represent faithfully the input object. Additionally, our method is able to generalize to images outside of the training domain (more realistic renderings and even real photographs). Finally, the inferred geometric scaffold is itself an accurate estimate of the object's 3D shape. We demonstrate in several experiments the effectiveness of our approach in both synthetic and real images.
翻译:我们提出一种方法来估计只给出一个图像的物体的神经场景。 我们的方法核心是估计该物体的几何脚架,并将其用作重建原始弧度场的指南。 我们的配方基于基因化过程,先将潜伏代码映射成一个氧化形状,然后将其转化为图像,再由第二个潜代号控制物体的外观。 在推断过程中,我们优化了潜在代码和网络,以适合新物体的测试图像。 形状和外观的明显脱钩使我们的模型能够以单一图像进行微调。 然后,我们可以以几何一致的方式提出新的观点,它们忠实地代表输入对象。 此外,我们的方法能够概括到训练领域以外的图像(更现实的图像,甚至真实的照片)。 最后,推断的几格假花本身就是对物体的3D形状的准确估计。 我们在若干实验中展示了我们在合成图像和真实图像中的方法的有效性。