We present a super-fast convergence approach to reconstructing the per-scene radiance field from a set of images that capture the scene with known poses. This task, which is often applied to novel view synthesis, is recently revolutionized by Neural Radiance Field (NeRF) for its state-of-the-art quality and flexibility. However, NeRF and its variants require a lengthy training time ranging from hours to days for a single scene. In contrast, our approach achieves NeRF-comparable quality and converges rapidly from scratch in less than 15 minutes with a single GPU. We adopt a representation consisting of a density voxel grid for scene geometry and a feature voxel grid with a shallow network for complex view-dependent appearance. Modeling with explicit and discretized volume representations is not new, but we propose two simple yet non-trivial techniques that contribute to fast convergence speed and high-quality output. First, we introduce the post-activation interpolation on voxel density, which is capable of producing sharp surfaces in lower grid resolution. Second, direct voxel density optimization is prone to suboptimal geometry solutions, so we robustify the optimization process by imposing several priors. Finally, evaluation on five inward-facing benchmarks shows that our method matches, if not surpasses, NeRF's quality, yet it only takes about 15 minutes to train from scratch for a new scene.
翻译:我们提出一种超快的趋同方法,从一组图像中重建每星光亮场,从一组图像中以已知的外形捕捉场景。这一任务通常应用于新颖的视图合成中,最近由神经半径场(NERF)对其最先进的质量和灵活性进行革命。然而,NERF及其变体需要长时间的培训时间,从几个小时到几天不等,单一场景。相比之下,我们的方法在不到15分钟的时间内达到NERF可比较的质量,从头到尾与单一的GPU迅速汇合。我们采用了由密度 voxel 网格的表示,用于现场的几何色测量和特征异异异变体电格,以浅网络为复杂的视貌外观的外观。以清晰和离散的体积表示并非新颖,但我们提出了两种简单但非三维的技术,有助于快速趋同速度和高质量输出单一场景。首先,我们引入了对 voxel 密度的后演算, 仅能生成低格分辨率的尖度表面。第二,直接的Voxel nixlimal nual nestimal imestalestalestage gradegradustrate sal graphal mabus