Simultaneous Localization and Mapping (SLAM) is one of the most essential techniques in many real-world robotic applications. The assumption of static environments is common in most SLAM algorithms, which however, is not the case for most applications. Recent work on semantic SLAM aims to understand the objects in an environment and distinguish dynamic information from a scene context by performing image-based segmentation. However, the segmentation results are often imperfect or incomplete, which can subsequently reduce the quality of mapping and the accuracy of localization. In this paper, we present a robust multi-modal semantic framework to solve the SLAM problem in complex and highly dynamic environments. We propose to learn a more powerful object feature representation and deploy the mechanism of looking and thinking twice to the backbone network, which leads to a better recognition result to our baseline instance segmentation model. Moreover, both geometric-only clustering and visual semantic information are combined to reduce the effect of segmentation error due to small-scale objects, occlusion and motion blur. Thorough experiments have been conducted to evaluate the performance of the proposed method. The results show that our method can precisely identify dynamic objects under recognition imperfection and motion blur. Moreover, the proposed SLAM framework is able to efficiently build a static dense map at a processing rate of more than 10 Hz, which can be implemented in many practical applications. Both training data and the proposed method is open sourced at https://github.com/wh200720041/MMS_SLAM.
翻译:同步本地化和绘图(SLAM)是许多真实世界机器人应用中最重要的技术之一。 静态环境的假设在大多数SLM算法中是常见的, 然而,大多数应用都不是这样。 最近关于语义 SLAM 的工作旨在了解环境中的物体,通过进行基于图像的分化,将动态信息与场景环境区分开来。 然而, 分解结果往往不完善或不完整, 从而可以随后降低绘图质量和本地化的准确性。 在本文中, 我们提出了一个强有力的多模式语义框架, 以便在复杂和高度动态的环境中解决SLAM问题。 我们提议学习一个更强大的对象特征代表, 并且两次向主干网部署寻找和思考的机制, 从而使我们的基线实例分解模型获得更好的识别结果。 此外, 仅使用几何组合和视觉语义信息, 以降低因小物体、 开源/ 封闭度 / 运动 模糊性 。 索罗夫实验是为了评估拟议方法的性能表现。 我们提出的方法可以精确地在动态模型中建立一个不完善的模型 。