Localizing stereo boundaries and predicting nearby disparities are difficult because stereo boundaries induce occluded regions where matching cues are absent. Most modern computer vision algorithms treat occlusions secondarily (e.g., via left-right consistency checks after matching) or rely on high-level cues to improve nearby disparities (e.g., via deep networks and large training sets). They ignore the geometry of stereo occlusions, which dictates that the spatial extent of occlusion must equal the amplitude of the disparity jump that causes it. This paper introduces an energy and level-set optimizer that improves boundaries by encoding occlusion geometry. Our model applies to two-layer, figure-ground scenes, and it can be implemented cooperatively using messages that pass predominantly between parents and children in an undecimated hierarchy of multi-scale image patches. In a small collection of figure-ground scenes curated from Middlebury and Falling Things stereo datasets, our model provides more accurate boundaries than previous occlusion-handling stereo techniques. This suggests new directions for creating cooperative stereo systems that incorporate occlusion cues in a human-like manner.
翻译:立体边界的本地化和对附近差异的预测是困难的,因为立体边界诱导着隐蔽的区域,没有匹配的提示。大多数现代计算机视觉算法同时(例如,在匹配后通过左右一致性检查)或依靠高层次暗示改善附近的差异(例如,通过深网络和大型培训组),它们忽视立体隔离的几何学,这就要求隔离的空间范围必须与导致它的差异跳动的幅度相等。本文介绍了一种能和定级优化的能量和定级优化器,通过编码隔离几何来改善边界。我们的模型适用于双层、图层场景,并且可以合作使用主要在多尺度图像补丁层未消减的层次中父母与子女之间传递的信息。在从中伯里和跌落物立的立体数据集中拼凑出的少量地表场景中,我们的模型提供了比先前的隔离立体技术更准确的界限。这为创建合作的立体系统提供了新的方向,将隐蔽的隐蔽感纳入人类的立体方式。