外观和语文外观和语文导航多式文本样式转让 (Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation) - 专知论文

会员服务 ·

0

多峰值 · Google Maps API · state-of-the-art · 学成 · Performance ·

2020 年 7 月 1 日

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation

翻译：外观和语文外观和语文导航多式文本样式转让

Wanrong Zhu,Xin Wang,Tsu-Jui Fu,An Yan,Pradyumna Narayana,Kazoo Sone,Sugato Basu,William Yang Wang

In the vision-and-language navigation (VLN) task, an agent follows natural language instructions and navigate in visual environments. Compared to the indoor navigation task that has been broadly studied, navigation in real-life outdoor environments remains a significant challenge with its complicated visual inputs and an insufficient amount of instructions that illustrate the intricate urban scenes. In this paper, we introduce a Multimodal Text Style Transfer (MTST) learning approach to mitigate the problem of data scarcity in outdoor navigation tasks by effectively leveraging external multimodal resources. We first enrich the navigation data by transferring the style of the instructions generated by Google Maps API, then pre-train the navigator with the augmented external outdoor navigation dataset. Experimental results show that our MTST learning approach is model-agnostic, and our MTST approach significantly outperforms the baseline models on the outdoor VLN task, improving task completion rate by 22\% relatively on the test set and achieving new state-of-the-art performance.

翻译：在视觉和语言导航(VLN)任务中,一个代理人在视觉环境中遵循自然语言指令和导航。与经过广泛研究的室内导航任务相比,在现实生活中室外环境中的导航仍是一个重大挑战,因为其视觉投入复杂,说明城市复杂环境的指示数量不足。在本文中,我们引入了多模式文本样式传输(MTST)学习方法,通过有效利用外部多式联运资源,减轻户外导航任务中的数据稀缺问题。我们首先通过转让谷歌地图API生成的指示风格来丰富导航数据,然后用扩大的外部室外导航数据集对导航员进行预先培训。实验结果显示,我们的MTT学习方法是模型级的,而我们的MTT方法大大超越了室外VLN任务的基线模型,在测试中相对提高了22 ⁇ 的任务完成率,并实现了新的最新业绩。

0

相关内容

多峰值

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

52+阅读 · 2020年5月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

【Google】无监督机器翻译，Unsupervised Machine Translation

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

想在PyTorch里训练BERT，请试试Facebook跨语言模型XLM

想在PyTorch里训练BERT，请试试Facebook跨语言模型XLM

量子位

3+阅读 · 2019年6月23日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

Speech2Action: Cross-modal Supervision for Action Recognition

Speech2Action: Cross-modal Supervision for Action Recognition

Arxiv

7+阅读 · 2020年3月30日

Visual Grounding in Video for Unsupervised Word Translation

Visual Grounding in Video for Unsupervised Word Translation

Arxiv

7+阅读 · 2020年3月11日

Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

Arxiv

3+阅读 · 2019年3月6日

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

Arxiv

9+阅读 · 2018年11月25日

Predicting Visual Features from Text for Image and Video Caption Retrieval

Arxiv

5+阅读 · 2018年7月14日

Neural Network Based Reinforcement Learning for Audio-Visual Gaze Control in Human-Robot Interaction

Arxiv

6+阅读 · 2018年4月23日

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

Arxiv

5+阅读 · 2018年4月5日

Learning to Guide Decoding for Image Captioning

Arxiv

6+阅读 · 2018年4月3日

Joint Training for Neural Machine Translation Models with Monolingual Data

Arxiv

4+阅读 · 2018年3月1日

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

Arxiv

3+阅读 · 2017年11月24日

VIP会员

文章信息

相关主题

Google Maps API

state-of-the-art

相关VIP内容

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

52+阅读 · 2020年5月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

【Google】无监督机器翻译，Unsupervised Machine Translation

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

想在PyTorch里训练BERT，请试试Facebook跨语言模型XLM

想在PyTorch里训练BERT，请试试Facebook跨语言模型XLM

量子位

3+阅读 · 2019年6月23日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

相关论文

Speech2Action: Cross-modal Supervision for Action Recognition

Speech2Action: Cross-modal Supervision for Action Recognition

Arxiv

7+阅读 · 2020年3月30日

Visual Grounding in Video for Unsupervised Word Translation

Visual Grounding in Video for Unsupervised Word Translation

Arxiv

7+阅读 · 2020年3月11日

Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

Arxiv

3+阅读 · 2019年3月6日

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

Arxiv

9+阅读 · 2018年11月25日

Predicting Visual Features from Text for Image and Video Caption Retrieval

Arxiv

5+阅读 · 2018年7月14日

Neural Network Based Reinforcement Learning for Audio-Visual Gaze Control in Human-Robot Interaction

Arxiv

6+阅读 · 2018年4月23日

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

Arxiv

5+阅读 · 2018年4月5日

Learning to Guide Decoding for Image Captioning

Arxiv

6+阅读 · 2018年4月3日

Joint Training for Neural Machine Translation Models with Monolingual Data

Arxiv

4+阅读 · 2018年3月1日

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

Arxiv

3+阅读 · 2017年11月24日

微信扫码咨询专知VIP会员