Neural Radiance Fields (NeRF) coupled with GANs represent a promising direction in the area of 3D reconstruction from a single view, owing to their ability to efficiently model arbitrary topologies. Recent work in this area, however, has mostly focused on synthetic datasets where exact ground-truth poses are known, and has overlooked pose estimation, which is important for certain downstream applications such as augmented reality (AR) and robotics. We introduce a principled end-to-end reconstruction framework for natural images, where accurate ground-truth poses are not available. Our approach recovers an SDF-parameterized 3D shape, pose, and appearance from a single image of an object, without exploiting multiple views during training. More specifically, we leverage an unconditional 3D-aware generator, to which we apply a hybrid inversion scheme where a model produces a first guess of the solution which is then refined via optimization. Our framework can de-render an image in as few as 10 steps, enabling its use in practical scenarios. We demonstrate state-of-the-art results on a variety of real and synthetic benchmarks.
翻译:神经辐射场 (NeRF) 结合 GAN 在从单个视图重建 3D 模型方面具有很大潜力,因为它们能够高效地建模任意拓扑结构。然而,该领域的最新研究大多关注于已知精确真实姿态的合成数据集,并忽略了对姿态估计的研究,这在某些后续应用中(例如增强现实(AR)和机器人技术)是重要的。我们提出了一种基于原则的端到端重建框架,用于处理自然图片,其中没有准确的真实姿态可用。我们的方法从一个对象的单个图像中恢复基于 SDF 的 3D 形状、姿态和外观,在训练期间不利用多个视角。更具体地说,我们利用一个无条件的 3D 感知生成器,通过应用混合反演方案,其中模型产生解决方案的第一个猜测,然后通过优化进行细化。我们的框架可以在尽可能少的步骤中去除渲染图像,使其在实际场景中得到应用。我们在各种真实和合成基准测试上展示了最先进的结果。