Despite the remarkable progress made by learning based stereo matching algorithms, one key challenge remains unsolved. Current state-of-the-art stereo models are mostly based on costly 3D convolutions, the cubic computational complexity and high memory consumption make it quite expensive to deploy in real-world applications. In this paper, we aim at completely replacing the commonly used 3D convolutions to achieve fast inference speed while maintaining comparable accuracy. To this end, we first propose a sparse points based intra-scale cost aggregation method to alleviate the well-known edge-fattening issue at disparity discontinuities. Further, we approximate traditional cross-scale cost aggregation algorithm with neural network layers to handle large textureless regions. Both modules are simple, lightweight, and complementary, leading to an effective and efficient architecture for cost aggregation. With these two modules, we can not only significantly speed up existing top-performing models (e.g., $41\times$ than GC-Net, $4\times$ than PSMNet and $38\times$ than GA-Net), but also improve the performance of fast stereo models (e.g., StereoNet). We also achieve competitive results on Scene Flow and KITTI datasets while running at 62ms, demonstrating the versatility and high efficiency of the proposed method. Our full framework is available at https://github.com/haofeixu/aanet .
翻译:尽管在学习基于立体匹配算法方面取得了显著进展,但一项关键挑战仍未解决。目前最先进的立体模型主要基于成本昂贵的3D演化、立方计算复杂性和高记忆消耗量,因此在现实世界应用程序中部署非常昂贵。在本文中,我们的目标是完全取代常用的3D演化,以达到快速推断速度,同时保持可比的准确性。为此,我们首先提出基于内部规模成本汇总的稀疏点方法,以缓解在不均状态下广为人知的边缘增高问题。此外,我们还将传统的跨规模成本汇总算法与神经网络层相近,以便处理无质大区域。这两个模块都是简单、轻重和互补的,导致成本汇总的高效和高效结构。有了这两个模块,我们不仅能够大大加快现有的顶级模型(例如,41美元比GC-Net,4美元比PSMNet/times 美元比GA-Net高出38\时间 ),而且还可以改进快速立体模型(e.g.rightwe and Screal etweal at at supal at the atal atal ex) atal ative atimus and supal ex and supal ex ex the suplegildal ex and saltiumstital ex setal ats) ex the supal ex the supal ex supaltime sal ex sal ex sal ex ands.