Self-supervised learning of depth and ego-motion from unlabeled monocular video has acquired promising results and drawn extensive attention. Most existing methods jointly train the depth and pose networks by photometric consistency of adjacent frames based on the principle of structure-from-motion (SFM). However, the coupling relationship of the depth and pose networks seriously influences the learning performance, and the re-projection relations is sensitive to scale ambiguity, especially for pose learning. In this paper, we aim to improve the depth-pose learning performance without the auxiliary tasks and address the above issues by alternative training each task and incorporating the epipolar geometric constraints into the Iterative Closest Point (ICP) based point clouds match process. Distinct from jointly training the depth and pose networks, our key idea is to better utilize the mutual dependency of these two tasks by alternatively training each network with respective losses while fixing the other. We also design a log-scale 3D structural consistency loss to put more emphasis on the smaller depth values during training. To makes the optimization easier, we further incorporate the epipolar geometry into the ICP based learning process for pose learning. Extensive experiments on various benchmarks datasets indicate the superiority of our algorithm over the state-of-the-art self-supervised methods.
翻译:从未贴上标签的单体视频中自我监督的深度和自我感知的深度和自我感知的学习取得了有希望的成果,并引起广泛的注意。大多数现有方法都根据结构自动原则(SFM),通过相邻框架的光度一致性,对深度进行联合训练,形成网络。然而,深度和形成网络的结合关系严重影响了学习表现,而再预测关系则敏感于规模模糊,特别是影响学习。在本文件中,我们的目标是提高深度学习业绩,而没有辅助任务,并通过替代培训解决上述问题,将上层几何限制纳入以热点为基础的云点匹配进程。与联合培训深度和形成网络不同,我们的关键想法是更好地利用这两项任务的相互依存性,在对每个网络进行相应损失的培训的同时,对另一个进行相应的调整。我们还设计了3D结构一致性的逻辑尺度损失,以更加强调培训过程中的较小深度值。为了使优化更加容易,我们进一步将顶部几何测量纳入以比较方案为基础的学习过程。关于各种基准自我测算方法的大规模实验显示了我们自我测算法的优越性。