The technology for Visual Odometry (VO) that estimates the position and orientation of the moving object through analyzing the image sequences captured by on-board cameras, has been well investigated with the rising interest in autonomous driving. This paper studies monocular VO from the perspective of Deep Learning (DL). Unlike most current learning-based methods, our approach, called DeepAVO, is established on the intuition that features contribute discriminately to different motion patterns. Specifically, we present a novel four-branch network to learn the rotation and translation by leveraging Convolutional Neural Networks (CNNs) to focus on different quadrants of optical flow input. To enhance the ability of feature selection, we further introduce an effective channel-spatial attention mechanism to force each branch to explicitly distill related information for specific Frame to Frame (F2F) motion estimation. Experiments on various datasets involving outdoor driving and indoor walking scenarios show that the proposed DeepAVO outperforms the state-of-the-art monocular methods by a large margin, demonstrating competitive performance to the stereo VO algorithm and verifying promising potential for generalization.
翻译:通过分析机载摄影机摄取的图像序列来估计移动物体的位置和方向的视觉Odoraty(VO)技术,已经随着对自主驾驶的兴趣的提高进行了深入调查。本文从深层学习(DL)的角度研究单子VO。与大多数目前以学习为基础的方法不同,我们的方法,称为DeepAVO(DeepAVO),是根据直觉建立的,这种直觉具有不同运动模式的特征。具体地说,我们展示了一个新的四分卫网络,通过利用动态神经网络(CNNs)来学习旋转和翻译,以光学流输入的不同方位为重点。为了提高地物选择能力,我们进一步采用了有效的频道空间关注机制,迫使每个分支明确为具体框架(F2F)运动估计框架(F2F)收集相关信息。关于户外驾驶和室内步行情景的各种数据集的实验显示,DeepAVO(DevelopAVO)在很大的幅度上超越了最新单眼方法,展示了立体算法的竞争性性表现,并核实一般化的可能性。