6-DoF visual localization systems utilize principled approaches rooted in 3D geometry to perform accurate camera pose estimation of images to a map. Current techniques use hierarchical pipelines and learned 2D feature extractors to improve scalability and increase performance. However, despite gains in typical recall@0.25m type metrics, these systems still have limited utility for real-world applications like autonomous vehicles because of their `worst' areas of performance - the locations where they provide insufficient recall at a certain required error tolerance. Here we investigate the utility of using `place specific configurations', where a map is segmented into a number of places, each with its own configuration for modulating the pose estimation step, in this case selecting a camera within a multi-camera system. On the Ford AV benchmark dataset, we demonstrate substantially improved worst-case localization performance compared to using off-the-shelf pipelines - minimizing the percentage of the dataset which has low recall at a certain error tolerance, as well as improved overall localization performance. Our proposed approach is particularly applicable to the crowdsharing model of autonomous vehicle deployment, where a fleet of AVs are regularly traversing a known route.
翻译:6-DoF视觉本地化系统使用基于 3D 几何法的原则方法来进行精确的摄像头,对图像进行估计。当前技术使用高层次管道,并学习二维特征提取器来改进可缩放性和提高性能。然而,尽管典型的回溯@0.25m型测量器取得了进步,但这些系统对于像自治车辆这样的真实世界应用仍然用处有限,因为其“最差”的性能领域——在某种必要的差错容忍度下,它们无法充分回忆数据集的所在地;我们在这里调查使用“特定配置”的效用,即将地图分割成若干地方,每个地方都有自己的配置来调整组合,以调整组合,在此情况下,在多摄像头系统中选择一台相机。在福特AV基准数据集中,我们显示,与使用现成管道相比,最差的本地化性能大大改进了,即尽可能降低在一定的差容度上记不清的数据集的百分比,并改进了总体本地化性能。我们提议的办法特别适用于自主车辆部署的人群共享模式,在那里,AV车队定期穿路。