Despite stereo matching accuracy has greatly improved by deep learning in the last few years, recovering sharp boundaries and high-resolution outputs efficiently remains challenging. In this paper, we propose Stereo Mixture Density Networks (SMD-Nets), a simple yet effective learning framework compatible with a wide class of 2D and 3D architectures which ameliorates both issues. Specifically, we exploit bimodal mixture densities as output representation and show that this allows for sharp and precise disparity estimates near discontinuities while explicitly modeling the aleatoric uncertainty inherent in the observations. Moreover, we formulate disparity estimation as a continuous problem in the image domain, allowing our model to query disparities at arbitrary spatial precision. We carry out comprehensive experiments on a new high-resolution and highly realistic synthetic stereo dataset, consisting of stereo pairs at 8Mpx resolution, as well as on real-world stereo datasets. Our experiments demonstrate increased depth accuracy near object boundaries and prediction of ultra high-resolution disparity maps on standard GPUs. We demonstrate the flexibility of our technique by improving the performance of a variety of stereo backbones.
翻译:尽管在过去几年里深思熟虑后,立体相匹配的准确性有了很大的提高,但恢复了尖锐的边界和高分辨率产出,这仍是一个挑战。在本文中,我们提议建立一个简单而有效的学习框架,与广泛的2D和3D结构相容,可以改善这两个问题。具体地说,我们利用双现代混合物密度作为产出表示,并表明这可以使接近不连续的精确和精确的差异估计值,同时明确模拟观测中固有的偏差不确定性。此外,我们把差异估计作为图像领域的一个持续问题,使我们的模型能够以任意的空间精确度查询差异。我们全面试验一个新的高分辨率和高度现实的合成立体数据集,包括8Mpx分辨率的立体对立体和真实世界的立体数据集。我们的实验表明,在标准GPU上,在目标边界和超高分辨率差异地图上,深度的精确度提高了。我们通过改进各种立体骨的性骨架的性能,展示了我们技术的灵活性。