Existing self-supervised monocular depth estimation methods can get rid of expensive annotations and achieve promising results. However, these methods suffer from severe performance degradation when directly adopting a model trained on a fixed resolution to evaluate at other different resolutions. In this paper, we propose a resolution adaptive self-supervised monocular depth estimation method (RA-Depth) by learning the scale invariance of the scene depth. Specifically, we propose a simple yet efficient data augmentation method to generate images with arbitrary scales for the same scene. Then, we develop a dual high-resolution network that uses the multi-path encoder and decoder with dense interactions to aggregate multi-scale features for accurate depth inference. Finally, to explicitly learn the scale invariance of the scene depth, we formulate a cross-scale depth consistency loss on depth predictions with different scales. Extensive experiments on the KITTI, Make3D and NYU-V2 datasets demonstrate that RA-Depth not only achieves state-of-the-art performance, but also exhibits a good ability of resolution adaptation.
翻译:然而,当直接采用经过固定分辨率培训的模型来对其他不同决议进行评价时,这些方法的性能严重退化。在本文中,我们建议通过学习场面深度的大小变化,采用适应性自我监督的单眼深度估计方法(RA-Depth)。具体地说,我们建议采用一种简单而有效的数据增强方法,为同一场景产生任意比例尺的图像。然后,我们开发一个双重高分辨率网络,利用多路径编码器和具有密集相互作用的分解器,与综合多尺度特征进行密集互动,以便准确进行深度推断。最后,为了明确了解场面深度的变异程度,我们制定了不同尺度的深度预测的跨尺度深度深度一致性损失。关于KITTI、Make3D和NYU-V2数据集的广泛实验表明,RA-Dept不仅取得了最新性能,而且还展示了分辨率调整的良好能力。