Neural Radiance Fields (NeRF) coupled with GANs represent a promising direction in the area of 3D reconstruction from a single view, owing to their ability to efficiently model arbitrary topologies. Recent work in this area, however, has mostly focused on synthetic datasets where exact ground-truth poses are known, and has overlooked pose estimation, which is important for certain downstream applications such as augmented reality (AR) and robotics. We introduce a principled end-to-end reconstruction framework for natural images, where accurate ground-truth poses are not available. Our approach recovers an SDF-parameterized 3D shape, pose, and appearance from a single image of an object, without exploiting multiple views during training. More specifically, we leverage an unconditional 3D-aware generator, to which we apply a hybrid inversion scheme where a model produces a first guess of the solution which is then refined via optimization. Our framework can de-render an image in as few as 10 steps, enabling its use in practical scenarios. We demonstrate state-of-the-art results on a variety of real and synthetic benchmarks.
翻译:与GANs一起的神经辐射场(Neoral Radiance Fields)是3D重建领域一个充满希望的方向,因为3D重建是一个单一的视角,因为它们有能力高效地模拟任意的地形。然而,最近在这一领域的工作主要侧重于已知确切地面真相的合成数据集,并忽视了对诸如增强现实和机器人等某些下游应用十分重要的构成估计。我们为自然图像引入了一个原则性的端到端重建框架,在那里无法提供准确的地面真相。我们的方法从一个物体的单一图像中恢复了SDF的3D的分计量形状、外观和外观,而没有在培训中利用多种观点。更具体地说,我们利用一个无条件的3D-aware生成器,我们对此采用了一种混合的转换办法,模型首先猜测解决办法,然后通过优化加以改进。我们的框架可以将图像去除10个步骤,使其在实际情景中得以使用。我们展示了各种真实和合成基准的状态。