愿景和语言导航:在实际环境中解释视觉定位导航指示 (Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments) - 专知论文

会员服务 ·

0

Vision · 回合 · 视觉问答 · 机器人 · 数据集 ·

2017 年 11 月 24 日

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

翻译：愿景和语言导航:在实际环境中解释视觉定位导航指示

Peter Anderson,Qi Wu,Damien Teney,Jake Bruce,Mark Johnson,Niko Sünderhauf,Ian Reid,Stephen Gould,Anton van den Hengel

A robot that can carry out a natural-language instruction has been a dream since before the Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot helpers. It is a dream that remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress in closely related areas. This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering. Both tasks can be interpreted as visually grounded sequence-to-sequence translation problems, and many of the same methods are applicable. To enable and encourage the application of vision and language methods to the problem of interpreting visually-grounded navigation instructions, we present the Matterport3D Simulator -- a large-scale reinforcement learning environment based on real imagery. Using this simulator, which can in future support a range of embodied vision and language tasks, we provide the first benchmark dataset for visually-grounded natural language navigation in real buildings -- the Room-to-Room (R2R) dataset.

翻译：能够进行自然语言教学的机器人自Jetsons漫画系列想象由一组关注的机器人助推员组成的休闲生活之前就是一个梦想,它是一个仍然顽固遥远的梦想。然而,在视觉和语言方法方面最近的进展在密切相关的领域取得了令人难以置信的进展。这很重要,因为机器人正在根据所看到的情况解释一种与视觉问答相似的视觉和语言教学过程。这两项任务都可以被解释为视觉基础序列到序列翻译问题,许多同样的方法也是适用的。为了能够并鼓励应用视觉和语言方法来解释视觉定位导航指示的问题,我们介绍了Mentport3D模拟器 -- -- 一种基于真实图像的大规模强化学习环境。我们利用这个模拟器,可以在未来支持一系列具有内涵的视觉和语言任务。我们为真实建筑的视觉基础自然语言导航提供了第一个基准数据集 -- -- 室到空间(R2R)数据集。

3

相关内容

Vision

【视频预测深度学习综述论文】A Review on Deep Learning Techniques for Video Prediction

【视频预测深度学习综述论文】A Review on Deep Learning Techniques for Video Prediction

专知会员服务

52+阅读 · 2020年4月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

机器视觉在织物疵点检测上的应用研究综述，Analysis on Application of Machine Vision in Fabric Defect Detection

机器视觉在织物疵点检测上的应用研究综述，Analysis on Application of Machine Vision in Fabric Defect Detection

专知会员服务

18+阅读 · 2020年2月16日

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

专知会员服务

55+阅读 · 2019年11月28日

【AAAI2020论文】使用GANs生成科学文章的关键短语（Keyphrase Generation for Scientific Articles using GANs）

【AAAI2020论文】使用GANs生成科学文章的关键短语（Keyphrase Generation for Scientific Articles using GANs）

专知会员服务

22+阅读 · 2019年11月15日

【ICCV 2019 Toturial】Interpretable Machine Learning for Computer Vision（用于计算机视觉的可解释性机器学习）

【ICCV 2019 Toturial】Interpretable Machine Learning for Computer Vision（用于计算机视觉的可解释性机器学习）

专知会员服务

32+阅读 · 2019年10月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Github 项目推荐 | 真实全景图像强化学习 AI 平台 —— Matterport3DSimulator

Github 项目推荐 | 真实全景图像强化学习 AI 平台 —— Matterport3DSimulator

AI研习社

9+阅读 · 2018年3月6日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

A Survey on Trajectory Data Management, Analytics, and Learning

A Survey on Trajectory Data Management, Analytics, and Learning

Arxiv

16+阅读 · 2020年3月25日

Directions for Explainable Knowledge-Enabled Systems

Directions for Explainable Knowledge-Enabled Systems

Arxiv

26+阅读 · 2020年3月17日

Visual Grounding in Video for Unsupervised Word Translation

Visual Grounding in Video for Unsupervised Word Translation

Arxiv

7+阅读 · 2020年3月11日

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Arxiv

5+阅读 · 2019年6月18日

Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

Arxiv

3+阅读 · 2019年3月6日

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

Arxiv

9+阅读 · 2018年11月25日

Fooling Vision and Language Models Despite Localization and Attention Mechanism

Arxiv

7+阅读 · 2018年4月6日

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

Arxiv

5+阅读 · 2018年4月5日

Not-So-CLEVR: Visual Relations Strain Feedforward Neural Networks

Arxiv

6+阅读 · 2018年2月12日

Game of Sketches: Deep Recurrent Models of Pictionary-style Word Guessing

Arxiv

5+阅读 · 2018年1月29日

VIP会员

文章信息

相关主题

相关VIP内容

【视频预测深度学习综述论文】A Review on Deep Learning Techniques for Video Prediction

【视频预测深度学习综述论文】A Review on Deep Learning Techniques for Video Prediction

专知会员服务

52+阅读 · 2020年4月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

机器视觉在织物疵点检测上的应用研究综述，Analysis on Application of Machine Vision in Fabric Defect Detection

机器视觉在织物疵点检测上的应用研究综述，Analysis on Application of Machine Vision in Fabric Defect Detection

专知会员服务

18+阅读 · 2020年2月16日

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

专知会员服务

55+阅读 · 2019年11月28日

【AAAI2020论文】使用GANs生成科学文章的关键短语（Keyphrase Generation for Scientific Articles using GANs）

【AAAI2020论文】使用GANs生成科学文章的关键短语（Keyphrase Generation for Scientific Articles using GANs）

专知会员服务

22+阅读 · 2019年11月15日

【ICCV 2019 Toturial】Interpretable Machine Learning for Computer Vision（用于计算机视觉的可解释性机器学习）

【ICCV 2019 Toturial】Interpretable Machine Learning for Computer Vision（用于计算机视觉的可解释性机器学习）

专知会员服务

32+阅读 · 2019年10月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能绝不能完全自主》

《人工智能的法律与伦理：军事自主机器独特挑战的深度剖析》316页

从数据到主导：AI与兵棋推演构筑决策优势

《特洛伊木马货柜：武器化集装箱的战略威胁》最新报告

相关资讯

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Github 项目推荐 | 真实全景图像强化学习 AI 平台 —— Matterport3DSimulator

Github 项目推荐 | 真实全景图像强化学习 AI 平台 —— Matterport3DSimulator

AI研习社

9+阅读 · 2018年3月6日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

相关论文

A Survey on Trajectory Data Management, Analytics, and Learning

A Survey on Trajectory Data Management, Analytics, and Learning

Arxiv

16+阅读 · 2020年3月25日

Directions for Explainable Knowledge-Enabled Systems

Directions for Explainable Knowledge-Enabled Systems

Arxiv

26+阅读 · 2020年3月17日

Visual Grounding in Video for Unsupervised Word Translation

Visual Grounding in Video for Unsupervised Word Translation

Arxiv

7+阅读 · 2020年3月11日

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Arxiv

5+阅读 · 2019年6月18日

Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

Arxiv

3+阅读 · 2019年3月6日

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

Arxiv

9+阅读 · 2018年11月25日

Fooling Vision and Language Models Despite Localization and Attention Mechanism

Arxiv

7+阅读 · 2018年4月6日

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

Arxiv

5+阅读 · 2018年4月5日

Not-So-CLEVR: Visual Relations Strain Feedforward Neural Networks

Arxiv

6+阅读 · 2018年2月12日

Game of Sketches: Deep Recurrent Models of Pictionary-style Word Guessing

Arxiv

5+阅读 · 2018年1月29日

微信扫码咨询专知VIP会员