Monocular depth estimation (MDE) in the self-supervised scenario has emerged as a promising method as it refrains from the requirement of ground truth depth. Despite continuous efforts, MDE is still sensitive to scale changes especially when all the training samples are from one single camera. Meanwhile, it deteriorates further since camera movement results in heavy coupling between the predicted depth and the scale change. In this paper, we present a scale-invariant approach for self-supervised MDE, in which scale-sensitive features (SSFs) are detached away while scale-invariant features (SIFs) are boosted further. To be specific, a simple but effective data augmentation by imitating the camera zooming process is proposed to detach SSFs, making the model robust to scale changes. Besides, a dynamic cross-attention module is designed to boost SIFs by fusing multi-scale cross-attention features adaptively. Extensive experiments on the KITTI dataset demonstrate that the detaching and boosting strategies are mutually complementary in MDE and our approach achieves new State-of-The-Art performance against existing works from 0.097 to 0.090 w.r.t absolute relative error. The code will be made public soon.
翻译:自我监督情景中的单心深度估计(MDE)已成为一种很有希望的方法,因为它避免了地面真实深度的要求。尽管不断作出努力,但MDE仍然对规模变化敏感,特别是当所有培训样本都是从一个相机中采集时。同时,由于相机移动导致预测深度与比例变化之间的大量混合,因此情况进一步恶化。在本文件中,我们为自我监督的MDE提出了一种规模差异性方法,在这种方法中,对比例敏感特征(SSFs)进行分离,同时进一步提升规模差异性特征(SIFs)。具体地说,建议通过模拟摄影机的放大过程,简单而有效的数据增强规模变化。此外,还设计了一个动态交叉注意模块,通过采用多尺度的交叉注意特性适应性能来增强SIFs。KITTI数据集的大规模实验表明,在MDE中,拆解和提升战略是相辅相成的。我们的方法通过模拟相机的放大过程,通过模拟摄影机放大了SSSFSFSFSFs,使模型对比例变化产生强大的模型。97至现有的绝对误差将很快实现。0.0的状态。