A major focus of recent developments in stereo vision has been on how to obtain accurate dense disparity maps in passive stereo vision. Active vision systems enable more accurate estimations of dense disparity compared to passive stereo. However, subpixel-accurate disparity estimation remains an open problem that has received little attention. In this paper, we propose a new learning strategy to train neural networks to estimate high-quality subpixel disparity maps for semi-dense active stereo vision. The key insight is that neural networks can double their accuracy if they are able to jointly learn how to refine the disparity map while invalidating the pixels where there is insufficient information to correct the disparity estimate. Our approach is based on Bayesian modeling where validated and invalidated pixels are defined by their stochastic properties, allowing the model to learn how to choose by itself which pixels are worth its attention. Using active stereo datasets such as Active-Passive SimStereo, we demonstrate that the proposed method outperforms the current state-of-the-art active stereo models. We also demonstrate that the proposed approach compares favorably with state-of-the-art passive stereo models on the Middlebury dataset.
翻译:立体视觉方面最新发展的主要重点是如何获得被动立体视觉中准确密度差异图。主动立体视觉系统能够比被动立体视觉更准确地估计密度差异。然而,亚像素准确性差异估计仍然是一个尚未解决的问题,但很少引起注意。在本文件中,我们提出了一个新的学习战略,以培训神经网络来估计半显性立体立体视觉的高质量子像素差异图。关键见解是,如果神经网络能够共同学习如何改进差异图,同时在信息不足以纠正差异估计的像素失效时,其准确性会翻倍。我们的方法以贝叶斯建模为基础,在贝斯建模中,经验证和无效的像素的特性界定了这些像素,使模型能够学会如何自己选择值得注意的像素。我们使用积极的立体数据集,例如主动式Passive SimStereo,我们表明,拟议的方法比目前状态式的主动立体立体模型要好得多。我们还表明,拟议的方法比中立体模型要好。