Recent years have witnessed the increasing application of place recognition in various environments, such as city roads, large buildings, and a mix of indoor and outdoor places. This task, however, still remains challenging due to the limitations of different sensors and the changing appearance of environments. Current works only consider the use of individual sensors, or simply combine different sensors, ignoring the fact that the importance of different sensors varies as the environment changes. In this paper, an adaptive weighting visual-LiDAR fusion method, named AdaFusion, is proposed to learn the weights for both images and point cloud features. Features of these two modalities are thus contributed differently according to the current environmental situation. The learning of weights is achieved by the attention branch of the network, which is then fused with the multi-modality feature extraction branch. Furthermore, to better utilize the potential relationship between images and point clouds, we design a twostage fusion approach to combine the 2D and 3D attention. Our work is tested on two public datasets, and experiments show that the adaptive weights help improve recognition accuracy and system robustness to varying environments.
翻译:近年来,在城市道路、大型建筑以及室内和室外地点等各种环境中越来越多地应用地点识别,但由于不同传感器的局限性和环境的外观变化,这项任务仍然具有挑战性。目前的工作只考虑使用单个传感器,或只是将不同传感器合并,忽视了不同传感器随着环境变化而不同的重要性这一事实。本文建议采用称为AdaFusion的可视-LiDAR聚变法,以了解图像和点云特征的重量。因此,这两种模式的特点根据当前环境情况不同而不同。网络的注意部门实现了加权学习,然后与多式特征提取部门相结合。此外,为了更好地利用图像和点云之间的潜在关系,我们设计了两阶段的集成方法,将2D和3D关注结合起来。我们的工作在两个公共数据集上进行了测试,实验显示,适应性重量有助于提高认知准确性和系统对不同环境的坚固性。