Stereo matching is a fundamental building block for many vision and robotics applications. An informative and concise cost volume representation is vital for stereo matching of high accuracy and efficiency. In this paper, we present a novel cost volume construction method which generates attention weights from correlation clues to suppress redundant information and enhance matching-related information in the concatenation volume. To generate reliable attention weights, we propose multi-level adaptive patch matching to improve the distinctiveness of the matching cost at different disparities even for textureless regions. The proposed cost volume is named attention concatenation volume (ACV) which can be seamlessly embedded into most stereo matching networks, the resulting networks can use a more lightweight aggregation network and meanwhile achieve higher accuracy, e.g. using only 1/25 parameters of the aggregation network can achieve higher accuracy for GwcNet. Furthermore, we design a highly accurate network (ACVNet) based on our ACV, which achieves state-of-the-art performance on several benchmarks.
翻译:台阶匹配是许多视觉和机器人应用的基本构件。信息化和简洁的成本量表示对于音响匹配高准确度和效率至关重要。在本文件中,我们提出了一个新的成本量构建方法,从相关线索中产生关注权重,以压制冗余信息,加强连接音量中的匹配相关信息。为了产生可靠的关注权重,我们建议采用多层次适应性配对匹配,以提高匹配成本的独特性,即使对于没有纹理的区域也是如此。提议的成本量被命名为可无缝嵌入大多数立体匹配网络的集中量(ACV ), 由此形成的网络可以使用更轻的集成网络, 并同时实现更高的准确性, 例如,只使用聚合网络的1/25参数, 就可以提高GwcNet的准确性。此外,我们根据我们的ACV 设计了一个非常精确的网络(ACVNet ), 该网络在几个基准上实现了最先进的业绩。