Self-supervised deep learning-based 3D scene understanding methods can overcome the difficulty of acquiring the densely labeled ground-truth and have made a lot of advances. However, occlusions and moving objects are still some of the major limitations. In this paper, we explore the learnable occlusion aware optical flow guided self-supervised depth and camera pose estimation by an adaptive cross weighted loss to address the above limitations. Firstly, we explore to train the learnable occlusion mask fused optical flow network by an occlusion-aware photometric loss with the temporally supplemental information and backward-forward consistency of adjacent views. And then, we design an adaptive cross-weighted loss between the depth-pose and optical flow loss of the geometric and photometric error to distinguish the moving objects which violate the static scene assumption. Our method shows promising results on KITTI, Make3D, and Cityscapes datasets under multiple tasks. We also show good generalization ability under a variety of challenging scenarios.
翻译:自我监督的基于深层次学习的三维场景理解方法可以克服获取高贴标签的地面真实性的困难,并取得了许多进步。 但是,隔离和移动对象仍然是一些主要限制。 在本文中,我们探索了了解光流导导自监督深度和相机的可学习封闭性隐蔽性隐蔽性能,通过适应性交叉加权损失来估计上述限制。 首先,我们探索如何通过隐蔽性光度光度和时间补充信息进行可学隐蔽性隐蔽面罩连接光学流网络,以及相邻视图的后向向一致性。 然后,我们设计了一种适应性交叉加权损失,在几何和光度误差的深度和光度流损失之间,以区分违反静态场假设的移动对象。 我们的方法显示了在多种任务下对 KITTI、Make3D和城市景数据集的可喜结果。 我们还在各种富有挑战的情景下展示了良好的概括性能力。