Visual Odometry (VO) is used in many applications including robotics and autonomous systems. However, traditional approaches based on feature matching are computationally expensive and do not directly address failure cases, instead relying on heuristic methods to detect failure. In this work, we propose a deep learning-based VO model to efficiently estimate 6-DoF poses, as well as a confidence model for these estimates. We utilise a CNN - RNN hybrid model to learn feature representations from image sequences. We then employ a Mixture Density Network (MDN) which estimates camera motion as a mixture of Gaussians, based on the extracted spatio-temporal representations. Our model uses pose labels as a source of supervision, but derives uncertainties in an unsupervised manner. We evaluate the proposed model on the KITTI and nuScenes datasets and report extensive quantitative and qualitative results to analyse the performance of both pose and uncertainty estimation. Our experiments show that the proposed model exceeds state-of-the-art performance in addition to detecting failure cases using the predicted pose uncertainty.
翻译:在许多应用中,包括机器人和自主系统,都使用了视觉Odorasy(VO),然而,基于特征匹配的传统方法在计算上费用昂贵,并不直接处理故障案例,而是依靠超自然方法来检测故障。在这项工作中,我们提议了一个深层次的基于学习的VO模型,以有效估计6-DoF的构成,以及这些估算的信任模型。我们使用CNN-RNN 混合模型从图像序列中学习特征表现。然后,我们使用一个混凝土密度网络(MDN),根据抽取的时空图解析,将相机运动作为高斯人混合体进行估计。我们使用模型作为标签作为监督的来源,但以不受监督的方式产生不确定性。我们评估了拟议的KITTI和nuScenes数据集模型,并报告了广泛的定量和定性结果,以分析构成和不确定性估计的性能。我们的实验表明,除了使用预测的不确定性来检测失败案例之外,拟议的模型超出了最新性能。