In this paper, we investigate visual-based camera re-localization with neural networks for robotics and autonomous vehicles applications. Our solution is a CNN-based algorithm which predicts camera pose (3D translation and 3D rotation) directly from a single image. It also provides an uncertainty estimate of the pose. Pose and uncertainty are learned together with a single loss function and are fused at test time with an EKF. Furthermore, we propose a new fully convolutional architecture, named CoordiNet, designed to embed some of the scene geometry. Our framework outperforms comparable methods on the largest available benchmark, the Oxford RobotCar dataset, with an average error of 8 meters where previous best was 19 meters. We have also investigated the performance of our method on large scenes for real time (18 fps) vehicle localization. In this setup, structure-based methods require a large database, and we show that our proposal is a reliable alternative, achieving 29cm median error in a 1.9km loop in a busy urban area
翻译:在本文中,我们通过机器人和自主飞行器应用的神经网络,对基于视觉的相机重新定位进行调查。我们的解决方案是以CNN为基础的算法,该算法预测相机直接由单一图像(3D翻译和3D旋转)产生(3D翻译和3D旋转),还提供对构成的不确定性估计。通过单一损失功能来了解波形和不确定性,并在测试时与EKF结合。此外,我们提出了一个新的完整的全革命结构,名为CoordiNet,旨在嵌入一些场景几何学。我们的框架比现有最大基准,牛津机器人汽车数据集的可比方法要快得多,平均误差8米,以前最好的是19米。我们还调查了我们的方法在大场上的性能,实际时间(18英尺)车辆定位。在这个设置中,基于结构的方法需要一个大型数据库。我们还表明,我们的提议是一个可靠的备选方案,在繁忙的城市地区的1.9公里环中达到29厘米的中位误差。