We describe an approach to large-scale indoor place recognition that aggregates low-level colour and geometric features with high-level semantic features. We use a deep learning network that takes in RGB point clouds and extracts local features with five 3-D kernel point convolutional (KPConv) layers. We specifically train the KPConv layers on the semantic segmentation task to ensure that the extracted local features are semantically meaningful. Then, feature maps from all the five KPConv layers are concatenated together and fed into the NetVLAD layer to generate the global descriptors. The approach is trained and evaluated using a large-scale indoor place recognition dataset derived from the ScanNet dataset, with a test set comprising 3,608 point clouds generated from 100 different rooms. Comparison with a traditional feature based method and three state-of-the-art deep learning methods demonstrate that the approach significantly outperforms all four methods, achieving, for example, a top-3 average recall rate of 75% compared with 41% for the closest rival method.
翻译:我们描述了一种大规模室内识别方法,该方法将低层次的颜色和几何特征与高层次语义特征结合起来。我们使用一个深层学习网络,在 RGB 点云中采集云层,并提取五层三维内核聚集点(KPConv)层的本地特征。我们专门对KPConv层进行了语义分解任务培训,以确保抽取的本地特征具有语义意义。然后,所有五个KPConv层的地貌地图被混为一堂,并输入NetVLAD层,以生成全球描述器。我们用扫描网数据集的大型室内识别数据集对这种方法进行了训练和评估,该测试数据集由100个不同房间生成的3,608个点云组成。与传统的基于特征的方法和三种最先进的深层学习方法相比,该方法大大超越了所有四种方法,例如,在最接近的方法中实现了75%的最高-3%的平均回溯率,而41%为41%。