Neural volumetric representations have shown the potential that MLP networks can be trained with multi-view calibrated images to represent scene geometry and appearance, without explicit 3D supervision. Object segmentation can enrich many downstream applications based on the learned radiance field. However, introducing hand-crafted segmentation to define regions of interest in a complex real-world scene are non-trivial and expensive as it acquires per view annotation. This paper carries out the exploration of self-supervised learning for object segmentation using NeRF for complex real-world scenes. Our framework, NeRF-SOS, couples object segmentation and neural radiance field to segment objects in any view within a scene. By proposing a novel collaborative contrastive loss in both appearance and geometry levels, NeRF-SOS encourages NeRF models to distill compact geometry-aware segmentation clusters from their density fields and the self-supervised pre-trained 2D visual features. The self-supervised object segmentation framework can be applied to various NeRF models that both lead to photo-realistic rendering results and convincing segmentations for both indoor and outdoor scenarios. Extensive results on the LLFF, Tank and Temple datasets validate the effectiveness of NeRF-SOS. It consistently surpasses other image-based self-supervised baselines and even captures finer details than supervised Semantic-NeRF.
翻译:神经体积图示显示,MLP网络有可能在没有明确3D监督的情况下,接受多视校准图像的培训,以显示场景几何和外观; 物体分离能够根据所学的光亮场,丰富许多下游应用; 然而,采用人工制造的分解,以界定复杂真实世界场景中感兴趣的区域,这是非边际和昂贵的,因为它获得每个视图的注解; 本文探索利用NeRF为复杂的现实世界场景使用自我监督的物体分解学习。 我们的框架、 NERF-SOS、对等对象的分解和神经光线场可以在场内任何视图中代表分解对象。 NERF-S通过提议在外观和几何水平上进行新的合作性对比性损失,鼓励NERF模型从密度场景和自我监督的2D预受训练的2D视觉特征中,将自我监督的物体分解框架应用于各种NERF模型,这些模型既可以产生摄影-现实结果,又能令人信服地分解的神经系统内部和室内地层图像的自我测试。