We present a novel deep-learning-based method for Multi-View Stereo. Our method estimates high resolution and highly precise depth maps iteratively, by traversing the continuous space of feasible depth values at each pixel in a binary decision fashion. The decision process leverages a deep-network architecture: this computes a pixelwise binary mask that establishes whether each pixel actual depth is in front or behind its current iteration individual depth hypothesis. Moreover, in order to handle occluded regions, at each iteration the results from different source images are fused using pixelwise weights estimated by a second network. Thanks to the adopted binary decision strategy, which permits an efficient exploration of the depth space, our method can handle high resolution images without trading resolution and precision. This sets it apart from most alternative learning-based Multi-View Stereo methods, where the explicit discretization of the depth space requires the processing of large cost volumes. We compare our method with state-of-the-art Multi-View Stereo methods on the DTU, Tanks and Temples and the challenging ETH3D benchmarks and show competitive results.
翻译:我们为多视立体提供了一种基于深层次的新方法。 我们的方法通过在每像素的连续空间中以二进制决定方式,对高分辨率和高度精确的深度地图进行迭接,对每个像素的连续的可行深度值空间进行探索。 决策过程利用了一个深网络结构: 计算出一个像素的二进制面罩, 确定每个像素的实际深度是在前方还是在后方, 确定每个像素的实际深度是其目前的迭接度个人深度假设。 此外, 为了处理隐蔽区域, 在每次迭接时, 不同来源图像的结果都使用由第二个网络估计的比素重进行结合。 由于采用了允许有效探索深度空间的二进制决定战略, 我们的方法可以处理高分辨率图像, 而不交易分辨率和精确度。 这将它与大多数基于学习的多视角立体法方法相区别开来, 在其中, 深层空间的清晰分解需要处理大额的费用。 我们将我们的方法与DTU、 Tanks和Temples以及具有挑战性的ETs基准和显示竞争性结果。