Visual-inertial localization is a key problem in computer vision and robotics applications such as virtual reality, self-driving cars, and aerial vehicles. The goal is to estimate an accurate pose of an object when either the environment or the dynamics are known. Absolute pose regression (APR) techniques directly regress the absolute pose from an image input in a known scene using convolutional and spatio-temporal networks. Odometry methods perform relative pose regression (RPR) that predicts the relative pose from a known object dynamic (visual or inertial inputs). The localization task can be improved by retrieving information from both data sources for a cross-modal setup, which is a challenging problem due to contradictory tasks. In this work, we conduct a benchmark to evaluate deep multimodal fusion based on pose graph optimization and attention networks. Auxiliary and Bayesian learning are utilized for the APR task. We show accuracy improvements for the APR-RPR task and for the RPR-RPR task for aerial vehicles and hand-held devices. We conduct experiments on the EuRoC MAV and PennCOSYVIO datasets and record and evaluate a novel industry dataset.
翻译:视觉-神经本地化是计算机视觉和机器人应用,如虚拟现实、自驾驶汽车和飞行器方面的一个关键问题。目标是在环境或动态已知时,对物体的准确构成作出估计。绝对回归(ARP)技术直接利用进化和时空网络使已知场景图像输入的绝对回归(APR)发生倒退。测量方法进行相对回归(RPR),预测已知物体动态(视觉或惯性投入)的相对构成(RPR),通过从两个数据源检索信息来改进本地化任务。为了跨模式设置,从两个数据源检索信息,可以改进本地化任务。由于任务相互矛盾,这是一个具有挑战性的问题。在这项工作中,我们用一个基准来评估以图示优化和关注网络为基础的深度多式聚合。利用辅助和巴耶斯学习来完成亚太区域风险评估任务。我们展示了非洲风险评估-RPR任务的准确性,以及飞行器和手持装置的RPR-R任务。我们在EuRoC MAV和PENCOSYVIO数据设置和新数据记录中进行实验。