Stereo depth estimation is of great interest for computer vision research. However, existing methods struggles to generalize and predict reliably in hazardous regions, such as large uniform regions. To overcome these limitations, we propose Context Enhanced Path (CEP). CEP improves the generalization and robustness against common failure cases in existing solutions by capturing the long-range global information. We construct our stereo depth estimation model, Context Enhanced Stereo Transformer (CSTR), by plugging CEP into the state-of-the-art stereo depth estimation method Stereo Transformer. CSTR is examined on distinct public datasets, such as Scene Flow, Middlebury-2014, KITTI-2015, and MPI-Sintel. We find CSTR outperforms prior approaches by a large margin. For example, in the zero-shot synthetic-to-real setting, CSTR outperforms the best competing approaches on Middlebury-2014 dataset by 11%. Our extensive experiments demonstrate that the long-range information is critical for stereo matching task and CEP successfully captures such information.
翻译:然而,现有方法在危险区域(如大统一区域)中力求普及和可靠地预测,以克服这些限制。我们提议了“环境强化路径”(CEP)。CEP通过捕捉远程全球信息,改进现有解决方案中常见故障案例的概括性和稳健性。我们通过将CEP插进最新立体深度估计方法“立体强化立体变异器”(CST)来构建我们的立体深度估计模型“环境强化立体变异器(CSTR)。CSTR是在不同的公共数据集(如Scene Flow、Miderbury-2014、KITTI-2015和MPI-Sintel)上进行的。我们发现,CSTR以大幅度的方式超越了先前的做法。例如,在零光合成到现实环境中,CSTR超越了11 %的Midbry-2014数据集的最佳竞争方法。我们的广泛实验表明,长程信息对于立体匹配任务至关重要,而CEPEP成功捕捉到这些信息。