We present a learning framework for reconstructing neural scene representations from a small number of unconstrained tourist photos. Since each image contains transient occluders, decomposing the static and transient components is necessary to construct radiance fields with such in-the-wild photographs where existing methods require a lot of training data. We introduce SF-NeRF, aiming to disentangle those two components with only a few images given, which exploits semantic information without any supervision. The proposed method contains an occlusion filtering module that predicts the transient color and its opacity for each pixel, which enables the NeRF model to solely learn the static scene representation. This filtering module learns the transient phenomena guided by pixel-wise semantic features obtained by a trainable image encoder that can be trained across multiple scenes to learn the prior of transient objects. Furthermore, we present two techniques to prevent ambiguous decomposition and noisy results of the filtering module. We demonstrate that our method outperforms state-of-the-art novel view synthesis methods on Phototourism dataset in a few-shot setting.
翻译:我们从少数不受限制的旅游照片中提供了一个重建神经场景演示的学习框架。 因为每张图像都包含瞬时的悬浮片, 分离静态和瞬时的组件是必要的, 以便用现有方法需要大量培训数据的光场照片构建光场。 我们引入了SF- NERF, 目的是将这两个组件与仅提供少量图像的图像分离, 该图像利用语义信息而没有任何监督。 拟议的方法包含一个封隔过滤模块, 该模块预测每个像素的瞬时颜色及其不透明性, 使得 NERF 模型能够只学习静态场景演示。 这个过滤模块学习由可训练图像编码器导导出的瞬时现象, 该图像编码器可在多个场中接受培训, 以了解瞬时的天体前。 此外, 我们展示了两种技术, 以防止过滤模块的模糊脱色和噪音结果。 我们展示了我们的方法在几张图的设置中, 超越了图片旅游数据集上的最新视图合成方法 。</s>