Self-supervised monocular depth estimation has become an appealing solution to the lack of ground truth labels, but its reconstruction loss often produces over-smoothed results across object boundaries and is incapable of handling occlusion explicitly. In this paper, we propose a new approach to leverage pseudo ground truth depth maps of stereo images generated from self-supervised stereo matching methods. The confidence map of the pseudo ground truth depth map is estimated to mitigate performance degeneration by inaccurate pseudo depth maps. To cope with the prediction error of the confidence map itself, we also leverage the threshold network that learns the threshold dynamically conditioned on the pseudo depth maps. The pseudo depth labels filtered out by the thresholded confidence map are used to supervise the monocular depth network. Furthermore, we propose the probabilistic framework that refines the monocular depth map with the help of its uncertainty map through the pixel-adaptive convolution (PAC) layer. Experimental results demonstrate superior performance to state-of-the-art monocular depth estimation methods. Lastly, we exhibit that the proposed threshold learning can also be used to improve the performance of existing confidence estimation approaches.
翻译:自我监督的单心深度估计已成为解决缺乏地面真相标签问题的一个诱人的解决办法,但重建损失往往会产生跨越目标边界的过度移动结果,无法明确处理隔绝性。 在本文中,我们提议采取新办法,利用自我监督的立体匹配方法产生的立体图像假地面真相深度地图。假地面真相深度地图的信任度图用于通过不准确的假深图减少性能的退化。为了应对信任地图本身的预测错误,我们还利用一个阈值网络,以动态方式学习假深图所设定的阈值。由临界信任图所过滤的假深度标签被用于监督单眼深度网络。此外,我们提议一个概率框架,借助其不确定性地图,通过像素适应性共振层来完善单眼深度地图。实验结果显示,在最先进的单心深度估计方法中,最佳表现。最后,我们展示,拟议的阈值学习也可以用来改进现有信任估计方法的性能。