Accurate and reliable localization is a fundamental requirement for autonomous vehicles to use map information in higher-level tasks such as navigation or planning. In this paper, we present a novel approach to vehicle localization in dense semantic maps, including vectorized high-definition maps or 3D meshes, using semantic segmentation from a monocular camera. We formulate the localization task as a direct image alignment problem on semantic images, which allows our approach to robustly track the vehicle pose in semantically labeled maps by aligning virtual camera views rendered from the map to sequences of semantically segmented camera images. In contrast to existing visual localization approaches, the system does not require additional keypoint features, handcrafted localization landmark extractors or expensive LiDAR sensors. We demonstrate the wide applicability of our method on a diverse set of semantic mesh maps generated from stereo or LiDAR as well as manually annotated HD maps and show that it achieves reliable and accurate localization in real-time.
翻译:准确和可靠的本地化是自主飞行器在导航或规划等更高层次任务中使用地图信息的基本要求。在本文中,我们展示了在密集的语义图中,包括矢量高清晰地图或3D meshes,使用单筒照相机的语义分解法,对车辆本地化采用新颖的方法。我们将本地化任务作为语义图像的一个直接图像匹配问题来设计,这样我们的方法就可以通过将地图上提供的虚拟相机视图与语义断面相机图像的序列相匹配,从而强有力地跟踪车辆在语义标签地图中呈现的图像。与现有的视觉本地化方法不同,该系统不需要额外的关键点特征、手制本地化地标提取器或昂贵的LIDAR传感器。我们展示了我们的方法在从立体或LIDAR生成的多种语义网域图以及人工加注的HD地图上的广泛适用性,并表明它实现了实时可靠和准确的本地化。