使用大型语言模型进行视觉目标导航：L3MVN (L3MVN: Leveraging Large Language Models for Visual Target Navigation) - 专知论文

会员服务 ·

0

常识知识 · 大型语言模型 · 语言模型 · 零样本 · 知识 ·

2023 年 4 月 11 日

L3MVN: Leveraging Large Language Models for Visual Target Navigation

翻译：使用大型语言模型进行视觉目标导航：L3MVN

Bangguo Yu,Hamidreza Kasaei,Ming Cao

from arxiv, 7 pages

Visual target navigation in unknown environments is a crucial problem in robotics. Despite extensive investigation of classical and learning-based approaches in the past, robots lack common-sense knowledge about household objects and layouts. Prior state-of-the-art approaches to this task rely on learning the priors during the training and typically require significant expensive resources and time for learning. To address this, we propose a new framework for visual target navigation that leverages Large Language Models (LLM) to impart common sense for object searching. Specifically, we introduce two paradigms: (i) zero-shot and (ii) feed-forward approaches that use language to find the relevant frontier from the semantic map as a long-term goal and explore the environment efficiently. Our analysis demonstrates the notable zero-shot generalization and transfer capabilities from the use of language. Experiments on Gibson and Habitat-Matterport 3D (HM3D) demonstrate that the proposed framework significantly outperforms existing map-based methods in terms of success rate and generalization. Ablation analysis also indicates that the common-sense knowledge from the language model leads to more efficient semantic exploration. Finally, we provide a real robot experiment to verify the applicability of our framework in real-world scenarios. The supplementary video and code can be accessed via the following link: https://sites.google.com/view/l3mvn.

翻译：在未知环境中进行视觉目标导航是机器人领域的一个关键问题。尽管过去进行了大量的经典和基于学习的方法的研究，但是机器人缺乏有关家庭物品和布局的常识知识。先前的最先进的方法依赖于在训练期间学习先验知识，并且通常需要显著的昂贵资源和时间来进行学习。为了解决这个问题，我们提出了一种新的视觉目标导航框架，该框架利用大型语言模型（LLM）来为对象搜索提供常识知识。具体而言，我们引入了两种范例：（i）零样本和（ii）前馈方法，利用语言在语义地图中查找相关前沿作为长期目标并有效地探索环境。我们的分析表明了从语言使用中显着的零样本泛化和转移能力。在Gibson和Habitat-Matterport 3D（HM3D）上的实验表明，该提议的框架在成功率和泛化方面显着优于现有的基于地图的方法。消融分析还表明，语言模型中的常识知识可以导致更有效的语义探索。最后，我们提供了一个真实机器人实验，以验证我们的框架在实际场景中的适用性。补充视频和代码可以通过以下链接访问：https://sites.google.com/view/l3mvn。

1

相关内容

常识知识

【CVPR2023】KERM:面向视觉语言导航的知识增强推理

【CVPR2023】KERM:面向视觉语言导航的知识增强推理

专知会员服务

24+阅读 · 2023年3月30日

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

专知会员服务

36+阅读 · 2022年3月25日

【AAAI2022】用于视觉常识推理的场景图增强图像-文本学习

【AAAI2022】用于视觉常识推理的场景图增强图像-文本学习

专知会员服务

50+阅读 · 2021年12月20日

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

专知会员服务

14+阅读 · 2020年6月18日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

51+阅读 · 2020年5月26日

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

专知会员服务

17+阅读 · 2020年5月19日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

专知会员服务

25+阅读 · 2019年11月15日

DeepMind新作：无需权重更新、提示和微调，transformer在试错中自主改进

DeepMind新作：无需权重更新、提示和微调，transformer在试错中自主改进

机器之心

1+阅读 · 2022年10月28日

IJCAI 2022 | 使用陈述句进行视觉问答的Prompt Tuning

IJCAI 2022 | 使用陈述句进行视觉问答的Prompt Tuning

PaperWeekly

3+阅读 · 2022年9月21日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新6篇图像描述生成相关论文—语言为枢纽、细粒度、生成器、注意力机制、策略梯度优化、判别性目标

【论文推荐】最新6篇图像描述生成相关论文—语言为枢纽、细粒度、生成器、注意力机制、策略梯度优化、判别性目标

专知

11+阅读 · 2018年3月20日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

针对大规模环境下复杂任务的策略搜索强化学习方法研究

国家自然科学基金

41+阅读 · 2015年12月31日

LncRNAK02111沉默Wnt5a促进糖尿病皮肤成纤维细胞凋亡的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于Aurivillius-Sillenite结构光催化材料的性能调控研究

国家自然科学基金

0+阅读 · 2014年12月31日

符号模型与隐式状态模型检测技术

国家自然科学基金

1+阅读 · 2012年12月31日

水稻CIC1蛋白调节光合作用低温适应的分子机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

CyclinE2-3'UTR竞争性结合miR-30e上调Notch1促进鼻咽癌转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

网络信息自主整合关键技术研究

国家自然科学基金

3+阅读 · 2011年12月31日

句子语义的视觉表示研究

国家自然科学基金

4+阅读 · 2009年12月31日

酪氨酸蛋白激酶Btk表达与激活的组蛋白乙酰化调节

国家自然科学基金

0+阅读 · 2008年12月31日

Prompting Is Programming: A Query Language for Large Language Models

Arxiv

0+阅读 · 2023年5月30日

TerrainNet: Visual Modeling of Complex Terrain for High-speed, Off-road Navigation

Arxiv

0+阅读 · 2023年5月29日

Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning

Arxiv

0+阅读 · 2023年5月29日

FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions

Arxiv

0+阅读 · 2023年5月28日

An Image Based Visual Servo Method for Probe-and-Drogue Autonomous Aerial Refueling

Arxiv

0+阅读 · 2023年5月27日

NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models

Arxiv

0+阅读 · 2023年5月26日

X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models

Arxiv

0+阅读 · 2023年5月26日

Structured Latent Variable Models for Articulated Object Interaction

Arxiv

0+阅读 · 2023年5月26日

Prompt Evolution for Generative AI: A Classifier-Guided Approach

Arxiv

0+阅读 · 2023年5月24日

A Survey of Decision Making in Adversarial Games

Arxiv

84+阅读 · 2022年7月16日

VIP会员

文章信息

相关主题

大型语言模型

相关VIP内容

【CVPR2023】KERM:面向视觉语言导航的知识增强推理

【CVPR2023】KERM:面向视觉语言导航的知识增强推理

专知会员服务

24+阅读 · 2023年3月30日

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

专知会员服务

36+阅读 · 2022年3月25日

【AAAI2022】用于视觉常识推理的场景图增强图像-文本学习

【AAAI2022】用于视觉常识推理的场景图增强图像-文本学习

专知会员服务

50+阅读 · 2021年12月20日

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

专知会员服务

14+阅读 · 2020年6月18日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

51+阅读 · 2020年5月26日

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

专知会员服务

17+阅读 · 2020年5月19日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

专知会员服务

25+阅读 · 2019年11月15日

热门VIP内容

开通专知VIP会员享更多权益服务

新质生成式AI赋能产业变革的实践与路径

用于多模态大模型的离散标记化：全面综述

Nature综述：金融网络中的物理学

【CMU博士论文】通信高效且差分隐私的优化方法

相关资讯

DeepMind新作：无需权重更新、提示和微调，transformer在试错中自主改进

DeepMind新作：无需权重更新、提示和微调，transformer在试错中自主改进

机器之心

1+阅读 · 2022年10月28日

IJCAI 2022 | 使用陈述句进行视觉问答的Prompt Tuning

IJCAI 2022 | 使用陈述句进行视觉问答的Prompt Tuning

PaperWeekly

3+阅读 · 2022年9月21日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新6篇图像描述生成相关论文—语言为枢纽、细粒度、生成器、注意力机制、策略梯度优化、判别性目标

【论文推荐】最新6篇图像描述生成相关论文—语言为枢纽、细粒度、生成器、注意力机制、策略梯度优化、判别性目标

专知

11+阅读 · 2018年3月20日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

相关论文

Prompting Is Programming: A Query Language for Large Language Models

Arxiv

0+阅读 · 2023年5月30日

TerrainNet: Visual Modeling of Complex Terrain for High-speed, Off-road Navigation

Arxiv

0+阅读 · 2023年5月29日

Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning

Arxiv

0+阅读 · 2023年5月29日

FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions

Arxiv

0+阅读 · 2023年5月28日

An Image Based Visual Servo Method for Probe-and-Drogue Autonomous Aerial Refueling

Arxiv

0+阅读 · 2023年5月27日

NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models

Arxiv

0+阅读 · 2023年5月26日

X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models

Arxiv

0+阅读 · 2023年5月26日

Structured Latent Variable Models for Articulated Object Interaction

Arxiv

0+阅读 · 2023年5月26日

Prompt Evolution for Generative AI: A Classifier-Guided Approach

Arxiv

0+阅读 · 2023年5月24日

A Survey of Decision Making in Adversarial Games

Arxiv

84+阅读 · 2022年7月16日

相关基金

针对大规模环境下复杂任务的策略搜索强化学习方法研究

国家自然科学基金

41+阅读 · 2015年12月31日

LncRNAK02111沉默Wnt5a促进糖尿病皮肤成纤维细胞凋亡的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于Aurivillius-Sillenite结构光催化材料的性能调控研究

国家自然科学基金

0+阅读 · 2014年12月31日

符号模型与隐式状态模型检测技术

国家自然科学基金

1+阅读 · 2012年12月31日

水稻CIC1蛋白调节光合作用低温适应的分子机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

CyclinE2-3'UTR竞争性结合miR-30e上调Notch1促进鼻咽癌转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

网络信息自主整合关键技术研究

国家自然科学基金

3+阅读 · 2011年12月31日

句子语义的视觉表示研究

国家自然科学基金

4+阅读 · 2009年12月31日

酪氨酸蛋白激酶Btk表达与激活的组蛋白乙酰化调节

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员