Finding accurate correspondences among different views is the Achilles' heel of unsupervised Multi-View Stereo (MVS). Existing methods are built upon the assumption that corresponding pixels share similar photometric features. However, multi-view images in real scenarios observe non-Lambertian surfaces and experience occlusions. In this work, we propose a novel approach with neural rendering (RC-MVSNet) to solve such ambiguity issues of correspondences among views. Specifically, we impose a depth rendering consistency loss to constrain the geometry features close to the object surface to alleviate occlusions. Concurrently, we introduce a reference view synthesis loss to generate consistent supervision, even for non-Lambertian surfaces. Extensive experiments on DTU and Tanks\&Temples benchmarks demonstrate that our RC-MVSNet approach achieves state-of-the-art performance over unsupervised MVS frameworks and competitive performance to many supervised methods.The code is released at https://github.com/Boese0601/RC-MVSNet
翻译:不同观点之间准确的对应之处是Achilles在未受监督的多视立体(MVS)的脚跟。现有方法基于以下假设:相应的像素具有相似的光度特征。然而,真实情景中的多视图像观测非蓝贝表面和经验隔离。在这项工作中,我们提出了一种新颖的神经成像(RC-MVSNet)方法,以解决各种观点之间通信的这种模糊问题。具体地说,我们施加了深度一致性损失,以限制靠近物体表面的几何特征以缓解隐蔽性。同时,我们引入了参考合成损失,以产生一致的监督,甚至对非蓝贝表表面。DTU和Tanks & Temples基准的广泛实验表明,我们的RC-MVSNet方法在不受监督的MVS框架和许多受监督方法的竞争性性能上达到了最先进的性能。代码发布在https://github.com/Boseen01/RC-MVSNet上。