LP-SLAM: 基于大型语言模型的语言感知RGB-D SLAM系统 (LP-SLAM: Language-Perceptive RGB-D SLAM system based on Large Language Model) - 专知论文

会员服务 ·

0

SLAM · 语言模型化 · 回合 · RGB-D · MoDELS ·

2023 年 3 月 17 日

LP-SLAM: Language-Perceptive RGB-D SLAM system based on Large Language Model

翻译：LP-SLAM: 基于大型语言模型的语言感知RGB-D SLAM系统

Weiyi Zhang,Yushi Guo,Liting Niu,Peijun Li,Chun Zhang,Zeyu Wan,Jiaxiang Yan,Fasih Ud Din Farrukh,Debing Zhang

from arxiv, 12 pages, 16 figures

Simultaneous localization and mapping (SLAM) is a critical technology that enables autonomous robots to be aware of their surrounding environment. With the development of deep learning, SLAM systems can achieve a higher level of perception of the environment, including the semantic and text levels. However, current works are limited in their ability to achieve a natural-language level of perception of the world. To address this limitation, we propose LP-SLAM, the first language-perceptive SLAM system that leverages large language models (LLMs). LP-SLAM has two major features: (a) it can detect text in the scene and determine whether it represents a landmark to be stored during the tracking and mapping phase, and (b) it can understand natural language input from humans and provide guidance based on the generated map. We illustrated three usages of the LLM in the system including text cluster, landmark judgment, and natural language navigation. Our proposed system represents an advancement in the field of LLMs based SLAM and opens up new possibilities for autonomous robots to interact with their environment in a more natural and intuitive way.

翻译：同时定位和地图构建（SLAM）技术是自主机器人认知周围环境的重要技术。随着深度学习技术的发展，SLAM系统可以实现对环境更高层次的感知，包括语义和文本水平。然而，现有的SLAM系统无法达到自然语言层面的感知。针对这一限制，我们提出了LP-SLAM系统，这是第一个利用大型语言模型（LLM）的语言感知SLAM系统。LP-SLAM系统具有两个重要的特性：（a）可以检测场景中的文本，并确定它是否表示存储在跟踪和构建地图阶段的地标，以及（b）可以理解人类自然语言输入，并根据所生成的地图提供引导。我们阐述了LLM在系统中的三个用途，包括文本聚类、地标判断和自然语言导航。我们提出的系统代表了LLM SLAM领域的一次进步，并为自主机器人以更加自然和直观的方式与他们的环境进行交互开辟了新的可能性。

0

相关内容

SLAM

即时定位与地图构建（SLAM或Simultaneouslocalizationandmapping）是这样一种技术：使得机器人和自动驾驶汽车等设备能在未知环境（没有先验知识的前提下）建立地图,或者在已知环境（已给出该地图的先验知识）中能更新地图,并保证这些设备能在同时追踪它们的当前位置。

多模态认知计算

多模态认知计算

专知会员服务

179+阅读 · 2022年9月16日

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

专知会员服务

36+阅读 · 2022年3月25日

【斯坦福Kevin Chen博士论文】视觉、语言和具身AI的多模态表示， Multimodal representations for vision, language, and embodied AI

【斯坦福Kevin Chen博士论文】视觉、语言和具身AI的多模态表示， Multimodal representations for vision, language, and embodied AI

专知会员服务

64+阅读 · 2022年3月6日

《行为与认知机器人学》，241页pdf

《行为与认知机器人学》，241页pdf

专知会员服务

54+阅读 · 2021年4月11日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

基于知识图谱的深度学习推荐系统研究，21页pdf，Deep Learning on Knowledge Graph for Recommender System: A Survey

基于知识图谱的深度学习推荐系统研究，21页pdf，Deep Learning on Knowledge Graph for Recommender System: A Survey

专知会员服务

158+阅读 · 2020年4月2日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

ICRA 2019 论文速览 | 基于Deep Learning 的SLAM

ICRA 2019 论文速览 | 基于Deep Learning 的SLAM

计算机视觉life

41+阅读 · 2019年7月22日

【泡泡汇总】CVPR2019 SLAM Paperlist

【泡泡汇总】CVPR2019 SLAM Paperlist

泡泡机器人SLAM

14+阅读 · 2019年6月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【泡泡一分钟】DynaSLAM：基于动态目标检测和背景修复的视觉SLAM

【泡泡一分钟】DynaSLAM：基于动态目标检测和背景修复的视觉SLAM

泡泡机器人SLAM

16+阅读 · 2019年1月27日

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

泡泡机器人SLAM

23+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

【泡泡一分钟】用于RGBD语义分割的三维图神经网络(ICCV2017-546)

【泡泡一分钟】用于RGBD语义分割的三维图神经网络(ICCV2017-546)

泡泡机器人SLAM

22+阅读 · 2018年12月4日

AMPK-Beclin-1/Vps34通路在维生素D3（Vit D)诱导足细胞自噬中的作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

复杂环境中空中-水面子母机器人系统自主行为方法研究

国家自然科学基金

9+阅读 · 2014年12月31日

言语产生中语义效应的发生机制：行为与事件相关电位研究

国家自然科学基金

0+阅读 · 2014年12月31日

具有临界指数的Schrodinger-Poisson系统的解

国家自然科学基金

0+阅读 · 2013年12月31日

动态复杂未知环境下的移动机器人实时SLAM算法研究

国家自然科学基金

2+阅读 · 2013年12月31日

不确定干扰的估计与抑制

国家自然科学基金

0+阅读 · 2012年12月31日

基于压缩感知的多机器人主动同步定位与地图构建研究

国家自然科学基金

1+阅读 · 2011年12月31日

面向嵌入式系统的虚拟化技术研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于Pt-Pt间相互作用的核酸适体生物传感器的研究

国家自然科学基金

0+阅读 · 2009年12月31日

仿驾驶员轨迹决策行为的无人驾驶车辆局部路径规划

国家自然科学基金

2+阅读 · 2008年12月31日

TidyBot: Personalized Robot Assistance with Large Language Models

Arxiv

0+阅读 · 2023年5月9日

Large Language Models Need Holistically Thought in Medical Conversational QA

Arxiv

0+阅读 · 2023年5月9日

Understanding why SLAM algorithms fail in modern indoor environments

Arxiv

0+阅读 · 2023年5月9日

SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models

Arxiv

0+阅读 · 2023年5月9日

Toward Connecting Speech Acts and Search Actions in Conversational Search Tasks

Arxiv

0+阅读 · 2023年5月8日

Guided Image Synthesis via Initial Image Editing in Diffusion Model

Arxiv

0+阅读 · 2023年5月5日

A Survey of Knowledge-Enhanced Pre-trained Language Models

Arxiv

18+阅读 · 2022年11月17日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

A Survey of Knowledge-Enhanced Text Generation

Arxiv

18+阅读 · 2020年10月9日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

多模态认知计算

多模态认知计算

专知会员服务

179+阅读 · 2022年9月16日

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

专知会员服务

36+阅读 · 2022年3月25日

【斯坦福Kevin Chen博士论文】视觉、语言和具身AI的多模态表示， Multimodal representations for vision, language, and embodied AI

【斯坦福Kevin Chen博士论文】视觉、语言和具身AI的多模态表示， Multimodal representations for vision, language, and embodied AI

专知会员服务

64+阅读 · 2022年3月6日

《行为与认知机器人学》，241页pdf

《行为与认知机器人学》，241页pdf

专知会员服务

54+阅读 · 2021年4月11日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

基于知识图谱的深度学习推荐系统研究，21页pdf，Deep Learning on Knowledge Graph for Recommender System: A Survey

基于知识图谱的深度学习推荐系统研究，21页pdf，Deep Learning on Knowledge Graph for Recommender System: A Survey

专知会员服务

158+阅读 · 2020年4月2日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【UCSD博士论文】可控且高效的视觉生成

构建具身智能新范式：人形机器人技术现状及发展趋势综述

中文版 | 美军引入AI指挥官“泰坦”推动国防技术转型

【ICML2025】《引入推理于视觉：通过模型融合理解感知与推理》

相关资讯

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

ICRA 2019 论文速览 | 基于Deep Learning 的SLAM

ICRA 2019 论文速览 | 基于Deep Learning 的SLAM

计算机视觉life

41+阅读 · 2019年7月22日

【泡泡汇总】CVPR2019 SLAM Paperlist

【泡泡汇总】CVPR2019 SLAM Paperlist

泡泡机器人SLAM

14+阅读 · 2019年6月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【泡泡一分钟】DynaSLAM：基于动态目标检测和背景修复的视觉SLAM

【泡泡一分钟】DynaSLAM：基于动态目标检测和背景修复的视觉SLAM

泡泡机器人SLAM

16+阅读 · 2019年1月27日

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

泡泡机器人SLAM

23+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

【泡泡一分钟】用于RGBD语义分割的三维图神经网络(ICCV2017-546)

【泡泡一分钟】用于RGBD语义分割的三维图神经网络(ICCV2017-546)

泡泡机器人SLAM

22+阅读 · 2018年12月4日

相关论文

TidyBot: Personalized Robot Assistance with Large Language Models

Arxiv

0+阅读 · 2023年5月9日

Large Language Models Need Holistically Thought in Medical Conversational QA

Arxiv

0+阅读 · 2023年5月9日

Understanding why SLAM algorithms fail in modern indoor environments

Arxiv

0+阅读 · 2023年5月9日

SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models

Arxiv

0+阅读 · 2023年5月9日

Toward Connecting Speech Acts and Search Actions in Conversational Search Tasks

Arxiv

0+阅读 · 2023年5月8日

Guided Image Synthesis via Initial Image Editing in Diffusion Model

Arxiv

0+阅读 · 2023年5月5日

A Survey of Knowledge-Enhanced Pre-trained Language Models

Arxiv

18+阅读 · 2022年11月17日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

A Survey of Knowledge-Enhanced Text Generation

Arxiv

18+阅读 · 2020年10月9日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

相关基金

AMPK-Beclin-1/Vps34通路在维生素D3（Vit D)诱导足细胞自噬中的作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

复杂环境中空中-水面子母机器人系统自主行为方法研究

国家自然科学基金

9+阅读 · 2014年12月31日

言语产生中语义效应的发生机制：行为与事件相关电位研究

国家自然科学基金

0+阅读 · 2014年12月31日

具有临界指数的Schrodinger-Poisson系统的解

国家自然科学基金

0+阅读 · 2013年12月31日

动态复杂未知环境下的移动机器人实时SLAM算法研究

国家自然科学基金

2+阅读 · 2013年12月31日

不确定干扰的估计与抑制

国家自然科学基金

0+阅读 · 2012年12月31日

基于压缩感知的多机器人主动同步定位与地图构建研究

国家自然科学基金

1+阅读 · 2011年12月31日

面向嵌入式系统的虚拟化技术研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于Pt-Pt间相互作用的核酸适体生物传感器的研究

国家自然科学基金

0+阅读 · 2009年12月31日

仿驾驶员轨迹决策行为的无人驾驶车辆局部路径规划

国家自然科学基金

2+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员