As the autonomous driving industry is slowly maturing, visual map localization is quickly becoming the standard approach to localize cars as accurately as possible. Owing to the rich data returned by visual sensors such as cameras or LiDARs, researchers are able to build different types of maps with various levels of details, and use them to achieve high levels of vehicle localization accuracy and stability in urban environments. Contrary to the popular SLAM approaches, visual map localization relies on pre-built maps, and is focused solely on improving the localization accuracy by avoiding error accumulation or drift. We define visual map localization as a two-stage process. At the stage of place recognition, the initial position of the vehicle in the map is determined by comparing the visual sensor output with a set of geo-tagged map regions of interest. Subsequently, at the stage of map metric localization, the vehicle is tracked while it moves across the map by continuously aligning the visual sensors' output with the current area of the map that is being traversed. In this paper, we survey, discuss and compare the latest methods for LiDAR based, camera based and cross-modal visual map localization for both stages, in an effort to highlight the strength and weakness of each approach.
翻译:随着自主驱动产业的逐渐成熟,视觉地图本地化正在迅速成为尽可能准确地将汽车本地化的标准方法。由于摄影机或LiDARs等视觉传感器提供的丰富数据,研究人员能够制作不同类型具有不同详细程度的地图,并使用这些地图在城市环境中实现车辆本地化的高度准确性和稳定性。与流行的SLAM方法相反,视觉地图本地化依靠预建地图,仅仅侧重于通过避免误差累积或漂移来提高本地化的准确性。我们把视觉地图本地化定义为一个两阶段的过程。在确认地点阶段,将视觉传感器输出与一组地理标记的地图区域进行比较,从而决定了该车辆在地图中的初始位置。随后,在地图标识本地化阶段,对车辆进行跟踪,同时通过不断将视觉传感器输出与正在绘制的地图当前区域保持同步。在本文中,我们调查、讨论并比较基于LIDAR的、基于相机的和跨模版图像本地化的最新方法。每个阶段,每个阶段都努力突出强度和每个阶段的强度。