Multi-view depth estimation plays a critical role in reconstructing and understanding the 3D world. Recent learning-based methods have made significant progress in it. However, multi-view depth estimation is fundamentally a correspondence-based optimization problem, but previous learning-based methods mainly rely on predefined depth hypotheses to build correspondence as the cost volume and implicitly regularize it to fit depth prediction, deviating from the essence of iterative optimization based on stereo correspondence. Thus, they suffer unsatisfactory precision and generalization capability. In this paper, we are the first to explore more general image correlations to establish correspondences dynamically for depth estimation. We design a novel iterative multi-view depth estimation framework mimicking the optimization process, which consists of 1) a correlation volume construction module that models the pixel similarity between a reference image and source images as all-to-all correlations; 2) a flow-based depth initialization module that estimates the depth from the 2D optical flow; 3) a novel correlation-guided depth refinement module that reprojects points in different views to effectively fetch relevant correlations for further fusion and integrate the fused correlation for iterative depth update. Without predefined depth hypotheses, the fused correlations establish multi-view correspondence in an efficient way and guide the depth refinement heuristically. We conduct sufficient experiments on ScanNet, DeMoN, ETH3D, and 7Scenes to demonstrate the superiority of our method on multi-view depth estimation and its best generalization ability.
翻译:多视角深度估算在重建和理解3D世界方面发挥着关键作用。最近基于学习的方法在这方面取得了显著的进展。然而,多视角深度估算从根本上说是一个基于通信的优化问题,但以前基于学习的方法主要依靠预先定义的深度假设来构建通信,作为成本量的深度假设,并隐含地将其规范化,使之与深度预测相适应,偏离基于立体通信的迭代优化的本质。因此,它们受到不尽人意的精确性和概括性能力的影响。在本文件中,我们首先探索更普遍的图像相关性,以便动态地为深度估算建立对应关系。我们设计了一个新的迭代多视角深度估算框架,模拟优化过程包括:(1) 一个相关的量度构建模块,将参考图像和源图像之间的类似性作为全方位的关联;(2) 一个基于流基深度初始化模块,用以估计2D光流的深度;(3) 一个新型的、有关联性指导性的深度精度优化模块,以便从不同的角度有效地获取相关的相关相关性,并整合用于迭接深度更新的深度估算。他没有预先界定的深度估算能力,而是在多视角上展示其深度的深度测试,我们在多视角上展示了我们。