This paper addresses the challenge of reconstructing 3D indoor scenes from multi-view images. Many previous works have shown impressive reconstruction results on textured objects, but they still have difficulty in handling low-textured planar regions, which are common in indoor scenes. An approach to solving this issue is to incorporate planer constraints into the depth map estimation in multi-view stereo-based methods, but the per-view plane estimation and depth optimization lack both efficiency and multi-view consistency. In this work, we show that the planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods. Specifically, we use an MLP network to represent the signed distance function as the scene geometry. Based on the Manhattan-world assumption, planar constraints are employed to regularize the geometry in floor and wall regions predicted by a 2D semantic segmentation network. To resolve the inaccurate segmentation, we encode the semantics of 3D points with another MLP and design a novel loss that jointly optimizes the scene geometry and semantics in 3D space. Experiments on ScanNet and 7-Scenes datasets show that the proposed method outperforms previous methods by a large margin on 3D reconstruction quality. The code is available at https://zju3dv.github.io/manhattan_sdf.
翻译:本文讨论从多视图图像中重建 3D 室内场景的挑战。 许多先前的工程已经展示了对纹理物体的令人印象深刻的重建成果, 但是它们仍然难以处理在室内场景中常见的低脂平面区域。 解决这一问题的方法是将平板的限制纳入多视图立体立体法的深度地图估计, 但每张平面估计和深度优化缺乏效率和多视角一致性。 在这项工作中, 我们显示平面限制可以方便地融入最近的隐含神经代表型重建方法。 具体地说, 我们使用一个 MLP 网络来代表已签名的距离功能作为现场几何学。 基于曼哈顿- 世界假设, 平面限制用于将平面和墙区域的地理测量规范纳入多视角立体立体立体法, 但要解决不准确的分解, 我们用另一个 MLP 将 3D 点的语义表达式和 设计新的损失, 共同优化基于 3D 空间的场景地测量和语义结构。 扫描网和 7- Scenus 数据重建 3Dmatios 的实验 数据系统显示前一个可用的方法。 在 3Dmatibsmaintmals 上可用的方法 。 。 3smax 。 。