ESC: 带软通识约束的探索用于零样本物体导航 (ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation) - 专知论文

会员服务 ·

0

零样本 · 开放领域 · 约束 · 样本 · 预训练 ·

2023 年 3 月 27 日

ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation

翻译：ESC: 带软通识约束的探索用于零样本物体导航

Kaiwen Zhou,Kaizhi Zheng,Connor Pryor,Yilin Shen,Hongxia Jin,Lise Getoor,Xin Eric Wang

The ability to accurately locate and navigate to a specific object is a crucial capability for embodied agents that operate in the real world and interact with objects to complete tasks. Such object navigation tasks usually require large-scale training in visual environments with labeled objects, which generalizes poorly to novel objects in unknown environments. In this work, we present a novel zero-shot object navigation method, Exploration with Soft Commonsense constraints (ESC), that transfers commonsense knowledge in pre-trained models to open-world object navigation without any navigation experience nor any other training on the visual environments. First, ESC leverages a pre-trained vision and language model for open-world prompt-based grounding and a pre-trained commonsense language model for room and object reasoning. Then ESC converts commonsense knowledge into navigation actions by modeling it as soft logic predicates for efficient exploration. Extensive experiments on MP3D, HM3D, and RoboTHOR benchmarks show that our ESC method improves significantly over baselines, and achieves new state-of-the-art results for zero-shot object navigation (e.g., 158% relative Success Rate improvement than CoW on MP3D).

翻译：能够准确定位和导航到特定物体是赋予行动代理在现实世界中操作和与对象交互以完成任务的重要能力。这种物体导航任务通常需要在带标记的视觉环境中进行大规模训练，但很难推广到未知环境中的新物体。在本研究中，我们提出了一种新颖的零样本物体导航方法，即带软通识约束的探索（ESC），将预训练模型中的通识知识转移到对视觉环境没有导航经验的情况下的开放领域物体导航。首先，ESC利用预训练的视觉和语言模型进行开放领域基于提示的连接，并利用预训练的通识语言模型进行房间和物体推理。然后，ESC通过将通识知识建模为软逻辑谓词，将其转化为导航动作，以实现高效的探索。在MP3D、HM3D和RoboTHOR基准测试上进行了大量实验，结果表明我们的ESC方法显着优于基线，并在零样本物体导航方面取得了新的最佳结果（例如，在MP3D上相对成功率比CoW提高了158%）。

0

相关内容

零样本

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

专知会员服务

37+阅读 · 2022年3月25日

【Hugging Face】指导文本生成与约束波束搜索🤗Transformers，Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

【Hugging Face】指导文本生成与约束波束搜索🤗Transformers，Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

专知会员服务

22+阅读 · 2022年3月18日

【AAAI2021】利用先验知识对场景图进行分类

【AAAI2021】利用先验知识对场景图进行分类

专知会员服务

61+阅读 · 2020年12月3日

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

专知会员服务

14+阅读 · 2020年6月18日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

52+阅读 · 2020年5月26日

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

专知会员服务

26+阅读 · 2020年5月6日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

专知会员服务

52+阅读 · 2020年1月20日

【ICCV 2019 Workshop】Adaptive Confidence Smoothing for Generalized Zero-Shot Learning，巴伊兰大学 Yuval Atzmon

【ICCV 2019 Workshop】Adaptive Confidence Smoothing for Generalized Zero-Shot Learning，巴伊兰大学 Yuval Atzmon

专知会员服务

13+阅读 · 2019年10月31日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

论文小综 | Using External Knowledge on VQA

论文小综 | Using External Knowledge on VQA

开放知识图谱

10+阅读 · 2020年10月18日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【泡泡一分钟】基于机器人的视觉惯性里程计（IROS2018-10）

【泡泡一分钟】基于机器人的视觉惯性里程计（IROS2018-10）

泡泡机器人SLAM

13+阅读 · 2019年1月3日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【泡泡一分钟】神经SLAM：使用外部存储器让智能体学习探索环境

【泡泡一分钟】神经SLAM：使用外部存储器让智能体学习探索环境

泡泡机器人SLAM

12+阅读 · 2018年4月17日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

45+阅读 · 2015年12月31日

基于机器学习的室外未知环境中移动机器人定位研究

国家自然科学基金

4+阅读 · 2014年12月31日

惯性与高阶特征辅助的图像动态环境感知方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于分布式视觉的移动机器人大规模室内环境认知方法研究

国家自然科学基金

3+阅读 · 2012年12月31日

室内场景的三维感知与理解

国家自然科学基金

1+阅读 · 2012年12月31日

高效率反型叠层有机太阳能电池的制备及机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于集值函数描述的移动机器人自主行为基础问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

模-相对Hochschild同调与上同调

国家自然科学基金

0+阅读 · 2011年12月31日

户外轮式移动机器人对地形地貌特征的自主感知、地图创建和沿途定位

国家自然科学基金

4+阅读 · 2009年12月31日

基于机器视觉和惯性测量的轮式滑动转向移动机器人定位导航与遥感知

国家自然科学基金

0+阅读 · 2008年12月31日

Experiential Explanations for Reinforcement Learning

Arxiv

0+阅读 · 2023年5月16日

Style Transfer Enabled Sim2Real Framework for Efficient Learning of Robotic Ultrasound Image Analysis Using Simulated Data

Arxiv

0+阅读 · 2023年5月16日

Soft Prompt Decoding for Multilingual Dense Retrieval

Arxiv

0+阅读 · 2023年5月15日

Fast Traversability Estimation for Wild Visual Navigation

Arxiv

0+阅读 · 2023年5月15日

KEPR: Knowledge Enhancement and Plausibility Ranking for Generative Commonsense Question Answering

Arxiv

0+阅读 · 2023年5月15日

Knowledge Refinement via Interaction Between Search Engines and Large Language Models

Arxiv

0+阅读 · 2023年5月12日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

101+阅读 · 2022年5月11日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

Transferring Common-Sense Knowledge for Object Detection

Arxiv

12+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

相关VIP内容

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

专知会员服务

37+阅读 · 2022年3月25日

【Hugging Face】指导文本生成与约束波束搜索🤗Transformers，Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

【Hugging Face】指导文本生成与约束波束搜索🤗Transformers，Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

专知会员服务

22+阅读 · 2022年3月18日

【AAAI2021】利用先验知识对场景图进行分类

【AAAI2021】利用先验知识对场景图进行分类

专知会员服务

61+阅读 · 2020年12月3日

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

专知会员服务

14+阅读 · 2020年6月18日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

52+阅读 · 2020年5月26日

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

专知会员服务

26+阅读 · 2020年5月6日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

专知会员服务

52+阅读 · 2020年1月20日

【ICCV 2019 Workshop】Adaptive Confidence Smoothing for Generalized Zero-Shot Learning，巴伊兰大学 Yuval Atzmon

【ICCV 2019 Workshop】Adaptive Confidence Smoothing for Generalized Zero-Shot Learning，巴伊兰大学 Yuval Atzmon

专知会员服务

13+阅读 · 2019年10月31日

热门VIP内容

开通专知VIP会员享更多权益服务

人机协同作战规划：来自美海军陆战队的大语言模型（LLM）使用教训

对北约军事总部战略规划制定与实施的研究 | 140页

美联参会指南-联合规划与执行概述及政策框架 | 32页

俄罗斯军事规划差异性凸显其思维的重要性 | 2025最新文献

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

论文小综 | Using External Knowledge on VQA

论文小综 | Using External Knowledge on VQA

开放知识图谱

10+阅读 · 2020年10月18日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【泡泡一分钟】基于机器人的视觉惯性里程计（IROS2018-10）

【泡泡一分钟】基于机器人的视觉惯性里程计（IROS2018-10）

泡泡机器人SLAM

13+阅读 · 2019年1月3日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【泡泡一分钟】神经SLAM：使用外部存储器让智能体学习探索环境

【泡泡一分钟】神经SLAM：使用外部存储器让智能体学习探索环境

泡泡机器人SLAM

12+阅读 · 2018年4月17日

相关论文

Experiential Explanations for Reinforcement Learning

Arxiv

0+阅读 · 2023年5月16日

Style Transfer Enabled Sim2Real Framework for Efficient Learning of Robotic Ultrasound Image Analysis Using Simulated Data

Arxiv

0+阅读 · 2023年5月16日

Soft Prompt Decoding for Multilingual Dense Retrieval

Arxiv

0+阅读 · 2023年5月15日

Fast Traversability Estimation for Wild Visual Navigation

Arxiv

0+阅读 · 2023年5月15日

KEPR: Knowledge Enhancement and Plausibility Ranking for Generative Commonsense Question Answering

Arxiv

0+阅读 · 2023年5月15日

Knowledge Refinement via Interaction Between Search Engines and Large Language Models

Arxiv

0+阅读 · 2023年5月12日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

101+阅读 · 2022年5月11日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

Transferring Common-Sense Knowledge for Object Detection

Arxiv

12+阅读 · 2018年4月3日

相关基金

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

45+阅读 · 2015年12月31日

基于机器学习的室外未知环境中移动机器人定位研究

国家自然科学基金

4+阅读 · 2014年12月31日

惯性与高阶特征辅助的图像动态环境感知方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于分布式视觉的移动机器人大规模室内环境认知方法研究

国家自然科学基金

3+阅读 · 2012年12月31日

室内场景的三维感知与理解

国家自然科学基金

1+阅读 · 2012年12月31日

高效率反型叠层有机太阳能电池的制备及机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于集值函数描述的移动机器人自主行为基础问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

模-相对Hochschild同调与上同调

国家自然科学基金

0+阅读 · 2011年12月31日

户外轮式移动机器人对地形地貌特征的自主感知、地图创建和沿途定位

国家自然科学基金

4+阅读 · 2009年12月31日

基于机器视觉和惯性测量的轮式滑动转向移动机器人定位导航与遥感知

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员