The great potential of unsupervised monocular depth estimation has been demonstrated by many works due to low annotation cost and impressive accuracy comparable to supervised methods. To further improve the performance, recent works mainly focus on designing more complex network structures and exploiting extra supervised information, e.g., semantic segmentation. These methods optimize the models by exploiting the reconstructed relationship between the target and reference images in varying degrees. However, previous methods prove that this image reconstruction optimization is prone to get trapped in local minima. In this paper, our core idea is to guide the optimization with prior knowledge from pretrained Flow-Net. And we show that the bottleneck of unsupervised monocular depth estimation can be broken with our simple but effective framework named FG-Depth. In particular, we propose (i) a flow distillation loss to replace the typical photometric loss that limits the capacity of the model and (ii) a prior flow based mask to remove invalid pixels that bring the noise in training loss. Extensive experiments demonstrate the effectiveness of each component, and our approach achieves state-of-the-art results on both KITTI and NYU-Depth-v2 datasets.
翻译:许多工程都证明了未经监督的单心深度估计的巨大潜力,因为注解成本低,而且与监督方法相近的准确性令人印象深刻。为了进一步提高绩效,最近的工作主要侧重于设计更复杂的网络结构,利用额外的监督信息,例如语义分割。这些方法通过在不同程度上利用目标与参考图像之间重建的关系优化模型。然而,以往的方法证明,这种图像重建优化很容易被困在本地微型中。在本文中,我们的核心思想是用预先培训的流程网先前的知识指导优化。我们还表明,未经监督的单心深度估计的瓶颈可以用我们称为FG-Depeh的简单而有效的框架打破。特别是,我们建议(一) 流动蒸馏损失,以取代典型的光度损失,该光度损失限制了模型的能力,(二) 先前以流为基础的遮罩,以清除在培训中造成噪音的无效像素。广泛的实验表明每个部件的有效性,我们的方法在KITTI2和NYUDSDSB上都取得了最新结果。