We present GeoNeRF, a generalizable photorealistic novel view synthesis method based on neural radiance fields. Our approach consists of two main stages: a geometry reasoner and a renderer. To render a novel view, the geometry reasoner first constructs cascaded cost volumes for each nearby source view. Then, using a Transformer-based attention mechanism and the cascaded cost volumes, the renderer infers geometry and appearance, and renders detailed images via classical volume rendering techniques. This architecture, in particular, allows sophisticated occlusion reasoning, gathering information from consistent source views. Moreover, our method can easily be fine-tuned on a single scene, and renders competitive results with per-scene optimized neural rendering methods with a fraction of computational cost. Experiments show that GeoNeRF outperforms state-of-the-art generalizable neural rendering models on various synthetic and real datasets. Lastly, with a slight modification to the geometry reasoner, we also propose an alternative model that adapts to RGBD images. This model directly exploits the depth information often available thanks to depth sensors. The implementation code will be publicly available.
翻译:我们介绍GeoNeRF, 这是一种基于神经亮度场景的、可普遍适用的摄影现实的新视角合成方法。 我们的方法由两个主要阶段组成: 几何辨识器和铸造器。 换句话说, 几何辨识器首先为附近的源视图构建连锁成本量。 然后, 使用基于变换器的注意机制和连锁成本量, 制造器推导几何和外观, 并通过典型的体积转换技术提供详细图像。 这个结构特别允许复杂的隔离推理, 从一致的源码角度收集信息。 此外, 我们的方法可以很容易地在单一的场景上进行微调, 并且通过计算成本的一小部分, 使每个恒星优化的神经转换方法产生竞争性结果。 实验显示, GeoNerefRF 超越了各种合成和真实数据元集的常规神经转换模型。 最后, 稍作修改后, 我们还提出一个适应 RGBD 图像的替代模型。 这个模型将直接利用由于深度传感器而经常提供的深度信息。 执行代码将公开提供。