Novel view synthesis and 3D modeling using implicit neural field representation are shown to be very effective for calibrated multi-view cameras. Such representations are known to benefit from additional geometric and semantic supervision. Most existing methods that exploit additional supervision require dense pixel-wise labels or localized scene priors. These methods cannot benefit from high-level vague scene priors provided in terms of scenes' descriptions. In this work, we aim to leverage the geometric prior of Manhattan scenes to improve the implicit neural radiance field representations. More precisely, we assume that only the knowledge of the scene (under investigation) being Manhattan is known - with no additional information whatsoever - with an unknown Manhattan coordinate frame. Such high-level prior is then used to self-supervise the surface normals derived explicitly in the implicit neural fields. Our modeling allows us to group the derived normals, followed by exploiting their orthogonality constraints for self-supervision. Our exhaustive experiments on datasets of diverse indoor scenes demonstrate the significant benefit of the proposed method over the established baselines.
翻译:使用隐性神经场面代表的隐性神经外观合成和 3D 模型模型显示,对校准多视摄像头来说非常有效,已知这种演示会受益于更多的几何和语义监督。大多数利用现有的额外监督方法需要密集的像素标签或局部场景前置。这些方法无法受益于在场面描述中提供的高层次模糊的场景前置。在这项工作中,我们的目标是利用曼哈顿场面之前的几何来改进隐性神经亮度现场演示。更确切地说,我们假定只有曼哈顿的现场知识(正在调查中的)才为曼哈顿,而没有任何补充信息,而曼哈顿的坐标框架又未知。这种高级前置方法随后被用来自我监督在隐性神经场上明确产生的表面正常状态。我们的模型允许我们对衍生的正常状态进行分组,然后利用这些常态的或多向自我超强的视觉的制约。我们在各种室内场面的数据集上进行的详尽实验表明拟议方法在既定基线上的巨大好处。