Multi-View Stereo (MVS) is a core task in 3D computer vision. With the surge of novel deep learning methods, learned MVS has surpassed the accuracy of classical approaches, but still relies on building a memory intensive dense cost volume. Novel View Synthesis (NVS) is a parallel line of research and has recently seen an increase in popularity with Neural Radiance Field (NeRF) models, which optimize a per scene radiance field. However, NeRF methods do not generalize to novel scenes and are slow to train and test. We propose to bridge the gap between these two methodologies with a novel network that can recover 3D scene geometry as a distance function, together with high-resolution color images. Our method uses only a sparse set of images as input and can generalize well to novel scenes. Additionally, we propose a coarse-to-fine sphere tracing approach in order to significantly increase speed. We show on various datasets that our method reaches comparable accuracy to per-scene optimized methods while being able to generalize and running significantly faster.
翻译:多视系统是3D计算机视野的一项核心任务。 随着新的深层次学习方法的激增,所学的MVS已经超过了古典方法的准确性,但仍依赖于构建一个记忆密集的密集成本体积。新视觉合成(NVS)是一个平行的研究线,最近也看到神经辐射场模型的受欢迎程度有所提高,这些模型优化了每个场景的亮度场。然而,NERF方法并不概括于新奇场景,培训和测试缓慢。我们提议用能够恢复3D场场景的远程几何功能的新型网络以及高分辨率的彩色图像来弥合这两种方法之间的差距。我们的方法仅使用一套稀有的图像作为投入,可以概括新的场景。此外,我们提议采用粗微到光谱的场追踪方法,以大幅加快速度。我们展示了各种数据集,我们的方法可以与每场景优化方法相近的精确度,同时能够非常快地普及和运行。