Depth and ego-motion estimations are essential for the localization and navigation of autonomous robots and autonomous driving. Recent studies make it possible to learn the per-pixel depth and ego-motion from the unlabeled monocular video. A novel unsupervised training framework is proposed with 3D hierarchical refinement and augmentation using explicit 3D geometry. In this framework, the depth and pose estimations are hierarchically and mutually coupled to refine the estimated pose layer by layer. The intermediate view image is proposed and synthesized by warping the pixels in an image with the estimated depth and coarse pose. Then, the residual pose transformation can be estimated from the new view image and the image of the adjacent frame to refine the coarse pose. The iterative refinement is implemented in a differentiable manner in this paper, making the whole framework optimized uniformly. Meanwhile, a new image augmentation method is proposed for the pose estimation by synthesizing a new view image, which creatively augments the pose in 3D space but gets a new augmented 2D image. The experiments on KITTI demonstrate that our depth estimation achieves state-of-the-art performance and even surpasses recent approaches that utilize other auxiliary tasks. Our visual odometry outperforms all recent unsupervised monocular learning-based methods and achieves competitive performance to the geometry-based method, ORB-SLAM2 with back-end optimization.
翻译:深度和自动估计值对于自主机器人和自主驱动的定位和导航至关重要。 最近的研究使得有可能从未贴标签的单色视频中学习半像深度和自动感动。 使用清晰的 3D 几何来提议一个新型的未经监督的培训框架, 3D 等级的完善和增强。 在此框架内, 深度和显示的估算值是分级和相互配合的, 以逐层地完善估计的表面层。 中间视图图像是通过以估计深度和粗糙的外观将像素转换成一个图像来提出和合成的。 然后, 残余的构成变化可以从新视图图像和相邻框架的图像中进行估计, 以精细微的外观图像来改进粗浅的外观。 迭代改进是以不同的方式实施的, 使整个框架得到优化。 同时, 提出了一个新的图像增强方法, 通过合成新视图图像图像, 创造性地增加3D 空间的外观, 并获得新的2D 放大的图像。 然后, KITTI 实验表明, 我们的深度估计值的深度估计值是状态的图像, 以及相向后向后平面的平整方法, 甚至利用了我们最近的直观方法, 。