Recurrent All-Pairs Field Transforms (RAFT) has shown great potentials in matching tasks. However, all-pairs correlations lack non-local geometry knowledge and have difficulties tackling local ambiguities in ill-posed regions. In this paper, we propose Iterative Geometry Encoding Volume (IGEV-Stereo), a new deep network architecture for stereo matching. The proposed IGEV-Stereo builds a combined geometry encoding volume that encodes geometry and context information as well as local matching details, and iteratively indexes it to update the disparity map. To speed up the convergence, we exploit GEV to regress an accurate starting point for ConvGRUs iterations. Our IGEV-Stereo ranks $1^{st}$ on KITTI 2015 and 2012 (Reflective) among all published methods and is the fastest among the top 10 methods. In addition, IGEV-Stereo has strong cross-dataset generalization as well as high inference efficiency. We also extend our IGEV to multi-view stereo (MVS), i.e. IGEV-MVS, which achieves competitive accuracy on DTU benchmark. Code is available at https://github.com/gangweiX/IGEV.
翻译:经常性全光场变换(RAFT)在匹配任务方面显示出巨大的潜力。然而,所有相干关系都缺乏非本地的几何学知识,难以解决条件差地区的本地模糊之处。在本文中,我们提议在新深网络结构结构中,跨几何编码量(IGEV-Stereo)是立体匹配的新结构。拟议的IGEV-Stereo 构建了一个综合几何编码量,将几何信息和背景信息以及地方匹配细节编码,并迭代索引,以更新差异图。为了加快趋同速度,我们利用GEV为CONGRUs的迭代点恢复精确的起点。我们的IGEV-Seoo将2015和2012年KITTI(Reflevisective)的一元列在所有公布的方法中,是10种方法中最快的。此外,IGEV-Stereo具有很强的交叉数据集以及高导法效率。我们还将我们的IGEV推广到多视图立体立体(MVS), i.e.e.Se.SreoVVSl/IGV.S bregV.</s>