Neural Radiance Field (NeRF) has achieved outstanding performance in modeling 3D objects and controlled scenes, usually under a single scale. In this work, we make the first attempt to bring NeRF to city-scale, with views ranging from satellite-level that captures the overview of a city, to ground-level imagery showing complex details of an architecture. The wide span of camera distance to the scene yields multi-scale data with different levels of detail and spatial coverage, which casts great challenges to vanilla NeRF and biases it towards compromised results. To address these issues, we introduce CityNeRF, a progressive learning paradigm that grows the NeRF model and training set synchronously. Starting from fitting distant views with a shallow base block, as training progresses, new blocks are appended to accommodate the emerging details in the increasingly closer views. The strategy effectively activates high-frequency channels in the positional encoding and unfolds more complex details as the training proceeds. We demonstrate the superiority of CityNeRF in modeling diverse city-scale scenes with drastically varying views, and its support for rendering views in different levels of detail.
翻译:在这项工作中,我们第一次尝试将NeRF引入城市规模,其观点从卫星级别到地面一级,反映一个城市的概况,到显示一个建筑的复杂细节的地面图像。摄像场距离宽广,产生了具有不同详细程度和空间覆盖范围的多尺度数据,对Vanilla NeRF提出了巨大挑战,并偏向于损害结果。为了解决这些问题,我们引入了CityNeRF,这是一个逐步学习的模式,它发展了NeRF模型和训练设施。随着培训的进展,新区块从与浅基块相适应的远方观点开始,随着培训的进展,附加以适应日益接近的观点中新出现的细节。该战略有效地激活定位编码中的高频频道,并随着培训的进行而展开更为复杂的细节。我们展示了CityNeRF在以截然不同的观点建模不同城市规模的场景方面所具有的优势,以及它支持以不同的详细程度表达观点。