We present MVSNeRF, a novel neural rendering approach that can efficiently reconstruct neural radiance fields for view synthesis. Unlike prior works on neural radiance fields that consider per-scene optimization on densely captured images, we propose a generic deep neural network that can reconstruct radiance fields from only three nearby input views via fast network inference. Our approach leverages plane-swept cost volumes (widely used in multi-view stereo) for geometry-aware scene reasoning, and combines this with physically based volume rendering for neural radiance field reconstruction. We train our network on real objects in the DTU dataset, and test it on three different datasets to evaluate its effectiveness and generalizability. Our approach can generalize across scenes (even indoor scenes, completely different from our training scenes of objects) and generate realistic view synthesis results using only three input images, significantly outperforming concurrent works on generalizable radiance field reconstruction. Moreover, if dense images are captured, our estimated radiance field representation can be easily fine-tuned; this leads to fast per-scene reconstruction with higher rendering quality and substantially less optimization time than NeRF.
翻译:我们提出了能够高效重建神经光亮场以进行视觉合成的新型神经合成方法MVSNERRF。与以往考虑对密集摄取图像进行每层优化的神经光亮场工程不同,我们建议建立一个通用的深神经网络,通过快速网络推断,从仅近处的三个输入视图中重建光亮场。我们的方法利用飞机擦拭成本量(在多视立体中广泛使用)进行几何觉场推理,并将这一成本量与以物理为基础的神经光亮场重建量结合起来。我们用三个不同的数据集对我们的网络进行了实际物体培训,并测试了三个不同的数据集,以评价其有效性和可概括性。我们的方法可以将场面(即使是室内场,与我们的物体培训场面完全不同)进行综合,并产生现实的视觉合成结果,仅使用三个输入图像,大大优于一般辐射场重建的并行工作。此外,如果采集到密度的图像,那么我们估计的亮度场代表度便容易进行微调;这导致每区快速重建,其质量更高,最优化的时间也大大低于NRF。