Multi-view stereo is an important research task in computer vision while still keeping challenging. In recent years, deep learning-based methods have shown superior performance on this task. Cost volume pyramid network-based methods which progressively refine depth map in coarse-to-fine manner, have yielded promising results while consuming less memory. However, these methods fail to take fully consideration of the characteristics of the cost volumes in each stage, leading to adopt similar range search strategies for each cost volume stage. In this work, we present a novel cost volume pyramid based network with different searching strategies for multi-view stereo. By choosing different depth range sampling strategies and applying adaptive unimodal filtering, we are able to obtain more accurate depth estimation in low resolution stages and iteratively upsample depth map to arbitrary resolution. We conducted extensive experiments on both DTU and BlendedMVS datasets, and results show that our method outperforms most state-of-the-art methods.
翻译:多视图立体器是计算机愿景中的一项重要研究任务,同时仍然具有挑战性。近年来,深层次的学习方法显示,在这项工作上表现优异。成本量的金字塔网络方法逐步以粗到软的方式改进深度地图,在消耗记忆较少的同时取得了令人乐观的成果。然而,这些方法未能充分考虑到每个阶段成本量的特点,导致对每个成本量阶段采取类似的范围搜索战略。在这项工作中,我们展示了一个新的成本量的金字塔网络,对多视图立体进行不同的搜索战略。通过选择不同的深度范围取样战略和采用适应性单式过滤方法,我们能够在低分辨率阶段获得更准确的深度估算,并在任意解决时获得迭接式高深度图。我们在DTU和BlendiveMVS数据集方面进行了广泛的实验,结果显示,我们的方法超过了最先进的方法。