In this paper, we propose CGI-Stereo, a novel neural network architecture that can concurrently achieve real-time performance, competitive accuracy, and strong generalization ability. The core of our CGI-Stereo is a Context and Geometry Fusion (CGF) block which adaptively fuses context and geometry information for more effective cost aggregation and meanwhile provides feedback to feature learning to guide more effective contextual feature extraction. The proposed CGF can be easily embedded into many existing stereo matching networks, such as PSMNet, GwcNet and ACVNet. The resulting networks show a significant improvement in accuracy. Specially, the model which incorporates our CGF with ACVNet ranks $1^{st}$ on the KITTI 2012 and 2015 leaderboards among all the published methods. We further propose an informative and concise cost volume, named Attention Feature Volume (AFV), which exploits a correlation volume as attention weights to filter a feature volume. Based on CGF and AFV, the proposed CGI-Stereo outperforms all other published real-time methods on KITTI benchmarks and shows better generalization ability than other real-time methods. Code is available at https://github.com/gangweiX/CGI-Stereo.
翻译:在本文中,我们提议CGI-Stereo(CGI-Stereo),这是一个新颖的神经网络架构,可以同时实现实时性能、竞争性准确性和强力的普及能力。我们CGI-Stereo的核心是一个背景和几何融合块(CGF),它适应性地结合了背景和几何信息,以便更有效地汇总成本,同时提供反馈,为学习特征提供反馈,以指导更有效的背景特征提取。拟议的CGFM(CGF)可以很容易地嵌入许多现有的立体匹配网络,如PSMNet、GwcNet和ACVNet。由此形成的网络在准确性方面显示出了显著的改进。特别是,将CGF与ACVNet(ACVNet)的1美元排名纳入2012年和2015年KITTI(C)领先板块的模型。我们进一步提出一个内容丰富和简明的成本卷,名为“注意地物卷(AFV)”,它利用一个相关量作为关注量来过滤一个特质量。根据CFGF和AFV,拟议的C-Stereo(Ceo)超越了所有其他实时方法,在KITTITI基准基准上公布的实时方法上公布的所有实时方法,并展示了GUB/M_SGIGIFM-C/GIFM-C/GIFM_GIFM)比其他可用的通用能力。</s>