Simultaneous localization and mapping (SLAM) is a critical technology that enables autonomous robots to be aware of their surrounding environment. With the development of deep learning, SLAM systems can achieve a higher level of perception of the environment, including the semantic and text levels. However, current works are limited in their ability to achieve a natural-language level of perception of the world. To address this limitation, we propose LP-SLAM, the first language-perceptive SLAM system that leverages large language models (LLMs). LP-SLAM has two major features: (a) it can detect text in the scene and determine whether it represents a landmark to be stored during the tracking and mapping phase, and (b) it can understand natural language input from humans and provide guidance based on the generated map. We illustrated three usages of the LLM in the system including text cluster, landmark judgment, and natural language navigation. Our proposed system represents an advancement in the field of LLMs based SLAM and opens up new possibilities for autonomous robots to interact with their environment in a more natural and intuitive way.
翻译:同时定位和地图构建(SLAM)技术是自主机器人认知周围环境的重要技术。随着深度学习技术的发展,SLAM系统可以实现对环境更高层次的感知,包括语义和文本水平。然而,现有的SLAM系统无法达到自然语言层面的感知。针对这一限制,我们提出了LP-SLAM系统,这是第一个利用大型语言模型(LLM)的语言感知SLAM系统。LP-SLAM系统具有两个重要的特性:(a)可以检测场景中的文本,并确定它是否表示存储在跟踪和构建地图阶段的地标,以及(b)可以理解人类自然语言输入,并根据所生成的地图提供引导。我们阐述了LLM在系统中的三个用途,包括文本聚类、地标判断和自然语言导航。我们提出的系统代表了LLM SLAM领域的一次进步,并为自主机器人以更加自然和直观的方式与他们的环境进行交互开辟了新的可能性。