Novel view synthesis and 3D modeling using implicit neural field representation are shown to be very effective for calibrated multi-view cameras. Such representations are known to benefit from additional geometric and semantic supervision. Most existing methods that exploit additional supervision require dense pixel-wise labels or localized scene priors. These methods cannot benefit from high-level vague scene priors provided in terms of scenes' descriptions. In this work, we aim to leverage the geometric prior of Manhattan scenes to improve the implicit neural radiance field representations. More precisely, we assume that only the knowledge of the indoor scene (under investigation) being Manhattan is known -- with no additional information whatsoever -- with an unknown Manhattan coordinate frame. Such high-level prior is used to self-supervise the surface normals derived explicitly in the implicit neural fields. Our modeling allows us to group the derived normals and exploit their orthogonality constraints for self-supervision. Our exhaustive experiments on datasets of diverse indoor scenes demonstrate the significant benefit of the proposed method over the established baselines.
翻译:新型视图合成和隐式神经场表示的三维建模已被证明在经过校准的多视角相机中非常有效。此类表示方法受益于额外的几何和语义监督。大多数现有的利用额外监督的方法需要密集的像素级标签或本地的场景先验知识。这些方法不能从以场景描述提供的高级不确定场景先验知识中受益。在本研究中,我们旨在利用曼哈顿场景的几何先验知识来改进隐式神经辐射场表示。更具体地,我们假设只知道正在研究的室内场景是曼哈顿 - 没有其他信息 - 并且该场景的曼哈顿坐标框架是未知的。使用这种高级先验监督来自我监督隐式神经场中明确导出的表面法线。我们的建模允许我们将导出的法线分组,并利用它们的正交性约束进行自监督。我们对多种室内场景的数据集进行了详尽的实验,证明了所提出的方法相对于已有基线的显著优势。