Neural implicit surfaces have become an important technique for multi-view 3D reconstruction but their accuracy remains limited. In this paper, we argue that this comes from the difficulty to learn and render high frequency textures with neural networks. We thus propose to add to the standard neural rendering optimization a direct photo-consistency term across the different views. Intuitively, we optimize the implicit geometry so that it warps views on each other in a consistent way. We demonstrate that two elements are key to the success of such an approach: (i) warping entire patches, using the predicted occupancy and normals of the 3D points along each ray, and measuring their similarity with a robust structural similarity (SSIM); (ii) handling visibility and occlusion in such a way that incorrect warps are not given too much importance while encouraging a reconstruction as complete as possible. We evaluate our approach, dubbed NeuralWarp, on the standard DTU and EPFL benchmarks and show it outperforms state of the art unsupervised implicit surfaces reconstructions by over 20% on both datasets.
翻译:神经隐含表面已成为多视图 3D 重建的重要技术,但其准确性仍然有限。 在本文中,我们辩称,这来自在神经网络中学习和制造高频纹理的困难。 因此,我们提议在标准神经转换优化中添加一个贯穿不同观点的直接相容的术语。 直觉上,我们优化隐含的几何方法,以便以一致的方式对彼此进行对立。 我们证明,两种要素是这种方法成功的关键:(一) 扭曲整个补丁,使用每条射线上3D点的预计占用和正常位置,并用强有力的结构相似性衡量其相似性;(二) 处理可见性和隔离性,以便不正确的扭曲性不会被过分重视,同时鼓励尽可能彻底的重建。我们用标准 DTU 和 EPFL 基准来评估我们的方法,即被调制的Neural Warp, 并显示它超越了两个数据集中未受控制的隐含表面重建的近20%的艺术状态。