Accurate and robust localization is a fundamental need for mobile agents. Visual-inertial odometry (VIO) algorithms exploit the information from camera and inertial sensors to estimate position and translation. Recent deep learning based VIO models attract attentions as they provide pose information in a data-driven way, without the need of designing hand-crafted algorithms. Existing learning based VIO models rely on recurrent models to fuse multimodal data and process sensor signal, which are hard to train and not efficient enough. We propose a novel learning based VIO framework with external memory attention that effectively and efficiently combines visual and inertial features for states estimation. Our proposed model is able to estimate pose accurately and robustly, even in challenging scenarios, e.g., on overcast days and water-filled ground , which are difficult for traditional VIO algorithms to extract visual features. Experiments validate that it outperforms both traditional and learning based VIO baselines in different scenes.
翻译:精确和稳健的本地化是移动剂的基本需要。视觉-免疫odard 算法利用照相机和惯性传感器的信息来估计位置和翻译。最近深入学习的VIO模型吸引人们的注意,因为它们以数据驱动的方式提供信息,而不需要设计手工制作的算法。现有的基于学习的VIO模型依靠经常性模型将多式数据和过程传感器信号集成起来,而这些数据和过程传感器信号既难以培训,又不够有效。我们提议了一个具有外部记忆关注的基于学习的VIO框架,将视觉和惯性特征有效地结合起来进行国家估计。我们提议的模型能够准确和有力地估计,即使是在具有挑战性的情况下,例如,在超时段和填水地面,传统的VIO算法很难提取视觉特征。实验证实它超越了不同场以传统和学习为基础的VIO基线。