We propose im2nerf, a learning framework that predicts a continuous neural object representation given a single input image in the wild, supervised by only segmentation output from off-the-shelf recognition methods. The standard approach to constructing neural radiance fields takes advantage of multi-view consistency and requires many calibrated views of a scene, a requirement that cannot be satisfied when learning on large-scale image data in the wild. We take a step towards addressing this shortcoming by introducing a model that encodes the input image into a disentangled object representation that contains a code for object shape, a code for object appearance, and an estimated camera pose from which the object image is captured. Our model conditions a NeRF on the predicted object representation and uses volume rendering to generate images from novel views. We train the model end-to-end on a large collection of input images. As the model is only provided with single-view images, the problem is highly under-constrained. Therefore, in addition to using a reconstruction loss on the synthesized input view, we use an auxiliary adversarial loss on the novel rendered views. Furthermore, we leverage object symmetry and cycle camera pose consistency. We conduct extensive quantitative and qualitative experiments on the ShapeNet dataset as well as qualitative experiments on Open Images dataset. We show that in all cases, im2nerf achieves the state-of-the-art performance for novel view synthesis from a single-view unposed image in the wild.
翻译:我们提出了一个学习框架 im2nerf, 这个框架可以预测连续神经物体的表达方式, 它在野外有一个单一输入图像, 由现成的识别方法中仅有的分块输出来监督。 构建神经光谱场的标准方法利用多视图的一致性, 并且需要许多对场景的校准观点。 当在野外学习大型图像数据时, 这一要求是无法满足的。 我们朝着解决这一缺陷迈出了一步, 引入了一种模型, 将输入图像编码成一个分解的物体代表形式, 包含物体形状的代码, 物体外观的代码, 以及捕获对象图像的估计相机。 此外, 我们的模型在预测对象的表达方式上设置了一个 NERF 条件, 并使用量的显示从新视角生成图像。 我们在大量输入图像的收集上培训模型端对端, 由于该模型仅提供单视图图像, 问题就严重没有得到足够的控制。 因此, 除了在合成输入视图上使用重建损失外, 我们还在新版本的视图上使用一个辅助性对准性损失。 此外, 我们在新版本的视图上, 我们利用了目标对预期的图像的图像, 在新图像中设置的图像中, 的定性实验中, 将所有定性实验中, 显示的定性实验中, 显示的定性的定性和图像的定性实验中, 显示的定性实验, 显示所有定性数据都显示的定性的定性的定性数据, 显示。 显示的定性的定性的特性的特性的实验, 。