Visual place recognition is one of the essential and challenging problems in the fields of robotics. In this letter, we for the first time explore the use of multi-modal fusion of semantic and visual modalities in dynamics-invariant space to improve place recognition in dynamic environments. We achieve this by first designing a novel deep learning architecture to generate the static semantic segmentation and recover the static image directly from the corresponding dynamic image. We then innovatively leverage the spatial-pyramid-matching model to encode the static semantic segmentation into feature vectors. In parallel, the static image is encoded using the popular Bag-of-words model. On the basis of the above multi-modal features, we finally measure the similarity between the query image and target landmark by the joint similarity of their semantic and visual codes. Extensive experiments demonstrate the effectiveness and robustness of the proposed approach for place recognition in dynamic environments.
翻译:视觉位置识别是机器人领域一个重要和具有挑战性的问题。 在此信里, 我们首次探索在动态- 变化空间使用多模式混合语义和视觉模式, 以提高动态- 变化空间的定位。 我们首先设计一个新的深层次学习架构, 生成静态语义分解, 直接从相应的动态图像中恢复静态图像。 然后我们创新地利用空间- 金字塔配对模型, 将静态语义分解编码成特性矢量。 同时, 静态图像使用流行的“ 字包” 模型编码。 根据上述多模式特征, 我们最终用其语义和视觉代码的相似性来测量查询图像和目标标志之间的相似性。 广泛实验显示了拟议在动态环境中识别地点的方法的有效性和稳健性。