Mega-NERRF:为虚拟飞航干道大规模内径可缩放建筑 (Mega-NeRF: Scalable Construction of Large-Scale NeRFs for Virtual Fly-Throughs)

We explore how to leverage neural radiance fields (NeRFs) to build interactive 3D environments from large-scale visual captures spanning buildings or even multiple city blocks collected primarily from drone data. In contrast to the single object scenes against which NeRFs have been traditionally evaluated, this setting poses multiple challenges including (1) the need to incorporate thousands of images with varying lighting conditions, all of which capture only a small subset of the scene, (2) prohibitively high model capacity and ray sampling requirements beyond what can be naively trained on a single GPU, and (3) an arbitrarily large number of possible viewpoints that make it unfeasible to precompute all relevant information beforehand (as real-time NeRF renderers typically do). To address these challenges, we begin by analyzing visibility statistics for large-scale scenes, motivating a sparse network structure where parameters are specialized to different regions of the scene. We introduce a simple geometric clustering algorithm that partitions training images (or rather pixels) into different NeRF submodules that can be trained in parallel. We evaluate our approach across scenes taken from the Quad 6k and UrbanScene3D datasets as well as against our own drone footage and show a 3x training speedup while improving PSNR by over 11% on average. We subsequently perform an empirical evaluation of recent NeRF fast renderers on top of Mega-NeRF and introduce a novel method that exploits temporal coherence. Our technique achieves a 40x speedup over conventional NeRF rendering while remaining within 0.5 db in PSNR quality, exceeding the fidelity of existing fast renderers.

翻译：我们探索如何利用神经光亮场(NERFs),从大型视觉捕捉建筑,甚至主要从无人机数据收集的多城市区块中建立互动的3D环境。与传统上对 NERFs 进行评估的单一物体场景相比,这一环境提出了多重挑战,包括:(1) 需要将数千幅不同照明条件下的图像纳入,所有这些图像只捕捉到场景的一小部分,(2) 模型能力和射线采样要求高得惊人,超过了在单一的GPU上可以接受的天真训练,(3) 任意大量可能的观点,使得无法事先预先预先预估测所有相关信息(即时 NERF 传输器通常提供实时 NERF 传输器) 。为了应对这些挑战,我们首先分析大型场景的可见度统计数据,启动一个分散的网络结构,将参数专门用于不同场景的不同区域。我们引入了简单的几何组算法,将培训图像(或像素)分成不同的NERF子模块,可以同时培训。我们评估了从Quad 6k 和Urban-de3 DereenDDs 之前的图像,然后将最新的直径越超前的NRFS,然后用S 方法进行我们最新的S 的S 的直径级的图像分析,然后将更新的图像的图像的升级的图像的升级的比的比的进度,然后在最新的比的比的图像,然后在最新的升级的比的比的图像的升级的升级的图像的升级的升级的升级的图像,然后在最新的图像,然后在不断演制制制成的升级的升级。