Despite the remarkable progress of deep learning in stereo matching, there exists a gap in accuracy between real-time models and slower state-of-the-art models which are suitable for practical applications. This paper presents an iterative multi-scale coarse-to-fine refinement (iCFR) framework to bridge this gap by allowing it to adopt any stereo matching network to make it fast, more efficient and scalable while keeping comparable accuracy. To reduce the computational cost of matching, we use multi-scale warped features to estimate disparity residuals and push the disparity search range in the cost volume to a minimum limit. Finally, we apply a refinement network to recover the loss of precision which is inherent in multi-scale approaches. We test our iCFR framework by adopting the matching networks from state-of-the art GANet and AANet. The result is 49$\times$ faster inference time compared to GANetdeep and 4$\times$ less memory consumption, with comparable error. Our best performing network, which we call FRSNet is scalable even up to an input resolution of 6K on a GTX 1080Ti, with inference time still below one second and comparable accuracy to AANet+. It out-performs all real-time stereo methods and achieves competitive accuracy on the KITTI benchmark.
翻译:尽管在立体匹配方面的深层学习取得了显著进展,但在实时模型与适合实际应用的较先进模型之间,在准确性方面存在差距,本文展示了一个迭代多尺度粗到软改进框架,以弥补这一差距,允许其采用任何立体匹配网络,使其快速、更有效和可缩放,同时保持可比准确性。为降低匹配的计算成本,我们使用多尺度扭曲特征来估计差异剩余量,并将成本量中的差异搜索范围推至最低限度。最后,我们应用一个精细网络来恢复多尺度方法所固有的精确度损失。我们测试我们的iCFR框架,采用来自先进GANet和AAANet的匹配网络。结果为49美元比GANet更快,4美元比记忆消耗少,差4美元。我们称之为FRSNet的最佳性能网络,甚至可以升级到6K在GTX 1080的精确度上输入第二项决议。我们测试我们的iFR框架,从GX 1080的精确度到KIT的精确度,比A-stal-stal-stall ax ax ax precilation-stall aximme ax be precal-time ax bex bex bex bex bex bex precaltime ax ax axtimetimexxtime axtimex-stst rotimex rotime robility-st rotical-st rogy 的方法。