LiDAR-based place recognition (LPR) is one of the most crucial components of autonomous vehicles to identify previously visited places in GPS-denied environments. Most existing LPR methods use mundane representations of the input point cloud without considering different views, which may not fully exploit the information from LiDAR sensors. In this paper, we propose a cross-view transformer-based network, dubbed CVTNet, to fuse the range image views (RIVs) and bird's eye views (BEVs) generated from the LiDAR data. It extracts correlations within the views themselves using intra-transformers and between the two different views using inter-transformers. Based on that, our proposed CVTNet generates a yaw-angle-invariant global descriptor for each laser scan end-to-end online and retrieves previously seen places by descriptor matching between the current query scan and the pre-built database. We evaluate our approach on three datasets collected with different sensor setups and environmental conditions. The experimental results show that our method outperforms the state-of-the-art LPR methods with strong robustness to viewpoint changes and long-time spans. Furthermore, our approach has a good real-time performance that can run faster than the typical LiDAR frame rate. The implementation of our method is released as open source at: https://github.com/BIT-MJY/CVTNet.
翻译:以 LiDAR 为基础的位置识别( LPR) 是自定义工具中最关键的组成部分之一, 用以识别在GPS 封闭环境中先前访问过的地点。 多数现有的 LPR 方法使用输入点云云的平坦表达方式, 而不考虑不同观点, 这可能无法充分利用来自 LiDAR 传感器的信息 。 在本文中, 我们提议建立一个跨视图变压器网络, 称为 CVTNet, 将LIDAR 数据生成的图像视图( RIVs) 和鸟眼视图( BEVs ) 整合起来。 它利用内部 Transexer 和两个不同观点之间使用 Intranstraxes 来提取观点内部的相对关系。 基于这一点, 我们提议的 CVT Net 生成了每个激光扫描端到端传感器的信息的轨迹全球描述器, 并且通过描述符匹配当前查询扫描扫描器与预建数据库之间的匹配器。 我们用不同传感器设置和环境条件收集的三种数据集的方法, 实验结果表明, 我们的方法超越了内部的RJR- RD 的常规视角, 以坚固的LPR 速度 。