While object reconstruction has made great strides in recent years, current methods typically require densely captured images and/or known camera poses, and generalize poorly to novel object categories. To step toward object reconstruction in the wild, this work explores reconstructing general real-world objects from a few images without known camera poses or object categories. The crux of our work is solving two fundamental 3D vision problems -- shape reconstruction and pose estimation -- in a unified approach. Our approach captures the synergies of these two problems: reliable camera pose estimation gives rise to accurate shape reconstruction, and the accurate reconstruction, in turn, induces robust correspondence between different views and facilitates pose estimation. Our method FORGE predicts 3D features from each view and leverages them in conjunction with the input images to establish cross-view correspondence for estimating relative camera poses. The 3D features are then transformed by the estimated poses into a shared space and are fused into a neural radiance field. The reconstruction results are rendered by volume rendering techniques, enabling us to train the model without 3D shape ground-truth. Our experiments show that FORGE reliably reconstructs objects from five views. Our pose estimation method outperforms existing ones by a large margin. The reconstruction results under predicted poses are comparable to the ones using ground-truth poses. The performance on novel testing categories matches the results on categories seen during training. Project page: https://ut-austin-rpl.github.io/FORGE/
翻译:虽然近年来天体重建取得了巨大进步,但目前的方法通常需要密集捕获的图像和/或已知的相机布局,并一般化到新对象类别。为了在野生天体重建目标,这项工作探索从一些没有已知相机布局或物体类别的图像中重建一般现实世界物体。我们工作的重心是解决两个基本的三维愿景问题 -- -- 以统一的方法塑造重建并进行估计。我们的方法捕捉了这两个问题的协同效应:可靠的相机构成估计导致准确的形状重建,而准确的重建反过来,在不同的观点之间产生强有力的对应关系,便于进行估计。我们的方法从每个视图中预测三维特征,并结合输入图像来利用它们来建立交叉视图通信以估计相对相机配置。然后,我们工作的重心要通过估计的外观转换成一个共享的空间,并结合到一个神经光亮的场。我们的方法通过量化技术得出了重建结果,使我们能够在不3D制成地面图例的情况下对模型进行培训。我们的实验显示,FORGE从五个视图中可靠地重建对象。我们的估算方法比重显示,在大规模测试期间的实地测试结果。