Recent techniques in self-supervised monocular depth estimation are approaching the performance of supervised methods, but operate in low resolution only. We show that high resolution is key towards high-fidelity self-supervised monocular depth prediction. Inspired by recent deep learning methods for Single-Image Super-Resolution, we propose a sub-pixel convolutional layer extension for depth super-resolution that accurately synthesizes high-resolution disparities from their corresponding low-resolution convolutional features. In addition, we introduce a differentiable flip-augmentation layer that accurately fuses predictions from the image and its horizontally flipped version, reducing the effect of left and right shadow regions generated in the disparity map due to occlusions. Both contributions provide significant performance gains over the state-of-the-art in self-supervised depth and pose estimation on the public KITTI benchmark. A video of our approach can be found at https://youtu.be/jKNgBeBMx0I.
翻译:自我监督的单层深度估算的最近技术接近了受监督方法的性能,但只能以低分辨率操作。我们显示,高分辨率是高忠诚度自我监督单层深度预测的关键。受最近对单一图像超级分辨率的深层学习方法的启发,我们建议为深度超分辨率进行次像素相层扩展,准确合成与其相应的低分辨率共变特征之间的高分辨率差异。此外,我们引入了一个可区分的翻增层,精确地将图像及其水平翻转版本的预测连接起来,从而降低差异图中因隔离产生的左侧和右侧阴影区域的影响。两种贡献都为自我监督深度的先进技术提供了显著的绩效收益,并对公众的KITTI基准进行了估算。我们在 https://yotu.be/jKNgBEBMx0I上可以找到一个有关我们方法的视频。