Multi-view stereo (MVS) is a crucial task for precise 3D reconstruction. Most recent studies tried to improve the performance of matching cost volume in MVS by designing aggregated 3D cost volumes and their regularization. This paper focuses on learning a robust feature extraction network to enhance the performance of matching costs without heavy computation in the other steps. In particular, we present a dynamic scale feature extraction network, namely, CDSFNet. It is composed of multiple novel convolution layers, each of which can select a proper patch scale for each pixel guided by the normal curvature of the image surface. As a result, CDFSNet can estimate the optimal patch scales to learn discriminative features for accurate matching computation between reference and source images. By combining the robust extracted features with an appropriate cost formulation strategy, our resulting MVS architecture can estimate depth maps more precisely. Extensive experiments showed that the proposed method outperforms other state-of-the-art methods on complex outdoor scenes. It significantly improves the completeness of reconstructed models. As a result, the method can process higher resolution inputs within faster run-time and lower memory than other MVS methods. Our source code is available at url{https://github.com/TruongKhang/cds-mvsnet}.
翻译:多视图立体器(MVS)是精确的 3D 重建的关键任务。 大多数最近的研究都试图通过设计总计的 3D 成本量及其正规化来提高 MVS 成本量匹配成本量的性能。 本文侧重于学习一个强大的特征提取网络, 以提高匹配成本的性能, 而无需在其他步骤中进行大量计算。 特别是, 我们展示了一个动态规模的特征提取网络, 即CDSFNet。 它由多个新型的相控层组成, 每一层都可以在图像表面正常曲线的指导下为每个像素选择一个适当的补丁比例。 因此, CDFSNet可以估算最佳的补丁比例, 学习精确匹配参考图像和源图像的区别性功能。 通过将强的提取特征与适当的成本制定战略结合起来, 我们产生的 MVS 架构可以更准确地估算深度图。 广泛的实验表明, 拟议的方法在复杂的户外场场景上优于其他状态- 方法。 它大大改进了重建模型的完整性。 因此, 这种方法可以在比其他 MVS/ ongrus/ hungr} 我们的源代码代码可以在其他 MVr/ kmevr/ commus 。