加强跨模式匹配和自控模拟学习,促进愿景语言导航 (Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation) - 专知论文

会员服务 ·

0

Performer · 学成 · 评论员 · 回合 · SPL ·

2018 年 11 月 25 日

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

翻译：加强跨模式匹配和自控模拟学习,促进愿景语言导航

Xin Wang,Qiuyuan Huang,Asli Celikyilmaz,Jianfeng Gao,Dinghan Shen,Yuan-Fang Wang,William Yang Wang,Lei Zhang

from arxiv, Technical report

Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments. In this paper, we study how to address three critical challenges for this task: the cross-modal grounding, the ill-posed feedback, and the generalization problems. First, we propose a novel Reinforced Cross-Modal Matching (RCM) approach that enforces cross-modal grounding both locally and globally via reinforcement learning (RL). Particularly, a matching critic is used to provide an intrinsic reward to encourage global matching between instructions and trajectories, and a reasoning navigator is employed to perform cross-modal grounding in the local visual scene. Evaluation on a VLN benchmark dataset shows that our RCM model significantly outperforms existing methods by 10% on SPL and achieves the new state-of-the-art performance. To improve the generalizability of the learned policy, we further introduce a Self-Supervised Imitation Learning (SIL) method to explore unseen environments by imitating its own past, good decisions. We demonstrate that SIL can approximate a better and more efficient policy, which tremendously minimizes the success rate performance gap between seen and unseen environments (from 30.7% to 11.7%).

翻译：视觉语言导航(VLN)是引导一个具有内涵的代理人在真实的 3D 环境中执行自然语言指令的任务。在本文中,我们研究如何应对这一任务的三个关键挑战:跨模式地面、错误的反馈和一般化问题。首先,我们提出一个新的加强跨模式匹配(RCM)方法,通过强化学习(RL)在当地和全球实施跨模式地面。特别是,一个匹配的批评家被用来提供内在的奖励,鼓励在指令和轨迹之间进行全球匹配,并使用一个推理导航员在当地视觉场进行跨模式地面定位。对VLN基准数据集的评价表明,我们的RCM模型大大超越了现有方法10%的SPL,并实现了新的状态性能。为了提高学习政策的普遍性,我们进一步引入了一种自我超模的模拟学习(SIL)方法,通过模仿其自身的过去、良好的决定来探索隐蔽环境。我们证明,SLMMM模式在11 中可以更好地看到从最短的成绩到更短的30 % 环境之间。

9

相关内容

Performer

图解FixMatch的半监督学习，The Illustrated FixMatch for Semi-Supervised Learning

图解FixMatch的半监督学习，The Illustrated FixMatch for Semi-Supervised Learning

专知会员服务

26+阅读 · 2020年4月2日

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

专知会员服务

24+阅读 · 2020年3月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

【Google】无监督机器翻译，Unsupervised Machine Translation

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【推荐】直接未来预测：增强学习监督学习

【推荐】直接未来预测：增强学习监督学习

机器学习研究会

6+阅读 · 2017年11月24日

Unified Vision-Language Pre-Training for Image Captioning and VQA

Unified Vision-Language Pre-Training for Image Captioning and VQA

Arxiv

8+阅读 · 2019年10月3日

Adversarial Representation Learning for Text-to-Image Matching

Adversarial Representation Learning for Text-to-Image Matching

Arxiv

6+阅读 · 2019年8月28日

Reward learning from human preferences and demonstrations in Atari

Arxiv

8+阅读 · 2018年11月15日

Supervised and Unsupervised Transfer Learning for Question Answering

Arxiv

4+阅读 · 2018年4月21日

Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning

Arxiv

6+阅读 · 2018年4月15日

IQA: Visual Question Answering in Interactive Environments

Arxiv

5+阅读 · 2018年4月5日

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

Arxiv

5+阅读 · 2018年4月5日

Mix-and-Match Tuning for Self-Supervised Semantic Segmentation

Arxiv

8+阅读 · 2018年1月30日

Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

Arxiv

4+阅读 · 2018年1月29日

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

Arxiv

3+阅读 · 2017年11月24日

VIP会员

文章信息

相关主题

相关VIP内容

图解FixMatch的半监督学习，The Illustrated FixMatch for Semi-Supervised Learning

图解FixMatch的半监督学习，The Illustrated FixMatch for Semi-Supervised Learning

专知会员服务

26+阅读 · 2020年4月2日

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

专知会员服务

24+阅读 · 2020年3月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

【Google】无监督机器翻译，Unsupervised Machine Translation

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】基于物理的模拟

流匹配在生物学与生命科学中的应用综述

高质量数据集实践指南（1.0）

ICML 2025 关于语言模型机械可解释性的教程

相关资讯

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【推荐】直接未来预测：增强学习监督学习

【推荐】直接未来预测：增强学习监督学习

机器学习研究会

6+阅读 · 2017年11月24日

相关论文

Unified Vision-Language Pre-Training for Image Captioning and VQA

Unified Vision-Language Pre-Training for Image Captioning and VQA

Arxiv

8+阅读 · 2019年10月3日

Adversarial Representation Learning for Text-to-Image Matching

Adversarial Representation Learning for Text-to-Image Matching

Arxiv

6+阅读 · 2019年8月28日

Reward learning from human preferences and demonstrations in Atari

Arxiv

8+阅读 · 2018年11月15日

Supervised and Unsupervised Transfer Learning for Question Answering

Arxiv

4+阅读 · 2018年4月21日

Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning

Arxiv

6+阅读 · 2018年4月15日

IQA: Visual Question Answering in Interactive Environments

Arxiv

5+阅读 · 2018年4月5日

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

Arxiv

5+阅读 · 2018年4月5日

Mix-and-Match Tuning for Self-Supervised Semantic Segmentation

Arxiv

8+阅读 · 2018年1月30日

Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

Arxiv

4+阅读 · 2018年1月29日

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

Arxiv

3+阅读 · 2017年11月24日

微信扫码咨询专知VIP会员