Self-supervised methods play an increasingly important role in monocular depth estimation due to their great potential and low annotation cost. To close the gap with supervised methods, recent works take advantage of extra constraints, e.g., semantic segmentation. However, these methods will inevitably increase the burden on the model. In this paper, we show theoretical and empirical evidence that the potential capacity of self-supervised monocular depth estimation can be excavated without increasing this cost. In particular, we propose (1) a novel data augmentation approach called data grafting, which forces the model to explore more cues to infer depth besides the vertical image position, (2) an exploratory self-distillation loss, which is supervised by the self-distillation label generated by our new post-processing method - selective post-processing, and (3) the full-scale network, designed to endow the encoder with the specialization of depth estimation task and enhance the representational power of the model. Extensive experiments show that our contributions can bring significant performance improvement to the baseline with even less computational overhead, and our model, named EPCDepth, surpasses the previous state-of-the-art methods even those supervised by additional constraints.
翻译:自我监督的方法由于潜力巨大和注释成本低,在单视深度估计中发挥着越来越重要的作用。为了缩小与监督方法的差距,最近的工作利用了额外的限制,例如语义分割。然而,这些方法将不可避免地增加模型的负担。在本文中,我们展示了理论和经验证据,证明自我监督单视深度估计的潜在能力可以在不增加这一成本的情况下被挖掘出来。我们特别提议:(1) 一种称为数据粘贴的新的数据增强方法,这迫使模型探索更多的线索,以推导垂直图像位置以外的深度;(2) 探索性自我蒸馏损失,由我们新的后处理方法(选择性后处理)产生的自我蒸馏标签监督;(3) 全面网络,旨在将自我监督的单视单视深度估计的潜在能力与深度评估任务的专门化联系起来,并加强模型的代表性。广泛的实验表明,我们的贡献可以大大改进基线的性能,降低计算性能,而我们的模型(甚至称为EPCDepeh)甚至超越了以前受监督的状态限制的方法。