This paper aims to reduce the rendering time of generalizable radiance fields. Some recent works equip neural radiance fields with image encoders and are able to generalize across scenes, which avoids the per-scene optimization. However, their rendering process is generally very slow. A major factor is that they sample lots of points in empty space when inferring radiance fields. In this paper, we present a hybrid scene representation which combines the best of implicit radiance fields and explicit depth maps for efficient rendering. Specifically, we first build the cascade cost volume to efficiently predict the coarse geometry of the scene. The coarse geometry allows us to sample few points near the scene surface and significantly improves the rendering speed. This process is fully differentiable, enabling us to jointly learn the depth prediction and radiance field networks from only RGB images. Experiments show that the proposed approach exhibits state-of-the-art performance on the DTU, Real Forward-facing and NeRF Synthetic datasets, while being at least 50 times faster than previous generalizable radiance field methods. We also demonstrate the capability of our method to synthesize free-viewpoint videos of dynamic human performers in real-time. The code will be available at https://zju3dv.github.io/enerf/.
翻译:本文旨在减少一般弧度字段的交接时间。 某些近期的工程为神经弧度字段配备了图像编码器, 并且能够对各种场景进行概括化, 避免了对每个屏幕的优化。 但是, 它们的转化过程通常非常缓慢。 一个主要的因素是, 当推断光度场时, 它们抽样在空空间中的很多点。 在本文中, 我们展示了一个混合的场景演示, 将隐含的亮度字段和清晰的深度地图结合起来, 以高效投影。 具体地说, 我们首先建立级联成本量, 以有效预测场景的粗度几何学。 粗略的几何地测量使我们得以在场外取样几个点, 从而大大提高交替速度。 这一过程完全不同, 使我们能够从仅RGB 图像中共同学习深度预测和光度场网络。 实验显示, 拟议的方法展示了 DTU、 真实前向法和 NRFESG 合成数据集的状态, 同时至少比先前通用的光度场场景方法快50倍。 我们还展示了我们的方法在现实/ 正在/ 正在/ 演示/ 演示的 演示/ 演示/ 演示的 演示的 演示/ 演示/ 演示/ 演示/ 演示/ 。 演示/ 。 将 将 演示/ 演示的 演示/ 演示/ 演示/ 演示/ 。