Self-supervised monocular depth estimation has shown impressive results in static scenes. It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions and occlusions. Consequently, existing methods show poor accuracy in dynamic scenes, and the estimated depth map is blurred at object boundaries because they are usually occluded in other training views. In this paper, we propose SC-DepthV3 for addressing the challenges. Specifically, we introduce an external pretrained monocular depth estimation model for generating single-image depth prior, namely pseudo-depth, based on which we propose novel losses to boost self-supervised training. As a result, our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes. We demonstrate the significantly superior performance of our method over previous methods on six challenging datasets, and we provide detailed ablation studies for the proposed terms. Source code and data will be released at https://github.com/JiawangBian/sc_depth_pl
翻译:在静态场景中,自我监督的单眼深度估计显示了令人印象深刻的结果,但它依靠的是培训网络的多视角一致性假设,而这种假设在动态物体区域和隔离中却被违反。因此,现有方法在动态场景中显示的准确性差,估计的深度地图在其他培训观点中通常被隐蔽,因此在物体边界上是模糊的。在本文中,我们提议SC-DepehV3 来应对挑战。具体地说,我们引入了外部预先训练的单眼深度估计模型,以产生先前的单一图像深度,即伪深度,据此我们提出新的损失,以促进自我监督培训。因此,我们的模型可以预测清晰和准确的深度地图,即使从高动态场景的单眼视频中进行培训。我们展示了我们的方法比以往在六套挑战性数据集上的方法高得多,我们为拟议术语提供了详细的缩略图研究。源代码和数据将在https://github.com/JiawangBian/sc_prient_pls上发布。