Recent work in multi-view stereo (MVS) combines learnable photometric scores and regularization with PatchMatch-based optimization to achieve robust pixelwise estimates of depth, normals, and visibility. However, non-learning based methods still outperform for large scenes with sparse views, in part due to use of geometric consistency constraints and ability to optimize over many views at high resolution. In this paper, we build on learning-based approaches to improve photometric scores by learning patch coplanarity and encourage geometric consistency by learning a scaled photometric cost that can be combined with reprojection error. We also propose an adaptive pixel sampling strategy for candidate propagation that reduces memory to enable training on larger resolution with more views and a larger encoder. These modifications lead to 6-15% gains in accuracy and completeness on the challenging ETH3D benchmark, resulting in higher F1 performance than the widely used state-of-the-art non-learning approaches ACMM and ACMP.
翻译:多视立体(MVS)最近的工作结合了可学习的光度分数和正规化,与基于PatchMatch的优化相结合,以实现对深度、正常度和能见度的稳健的像素估计,然而,非学习方法仍然优于大场景,但观点稀少,部分原因是使用了几何一致性限制,而且有能力在高分辨率时优化许多观点。在本文件中,我们以学习的基于学习的方法为基础,改进光度分数,通过学习可与再预测误差相结合的尺度化光度成本,鼓励几何一致性。我们还提议为候选传播制定适应性像素取样战略,减少记忆,以便能够用更多视图和更大的编码进行大分辨率培训。这些修改使挑战性的ET3D基准的准确性和完整性提高了6-15%,导致F1的性表现高于广泛使用的ACMM和ACCMP非学习方法。