改进视觉和语文导航中视觉代表情况的结构编码辅助任务 (Structure-Encoding Auxiliary Tasks for Improved Visual Representation in Vision-and-Language Navigation) - 专知论文

会员服务 ·

0

回合 · ImageNet (数据集) · Agent · INFORMS · Better ·

2022 年 11 月 20 日

Structure-Encoding Auxiliary Tasks for Improved Visual Representation in Vision-and-Language Navigation

翻译：改进视觉和语文导航中视觉代表情况的结构编码辅助任务

Chia-Wen Kuo,Chih-Yao Ma,Judy Hoffman,Zsolt Kira

In Vision-and-Language Navigation (VLN), researchers typically take an image encoder pre-trained on ImageNet without fine-tuning on the environments that the agent will be trained or tested on. However, the distribution shift between the training images from ImageNet and the views in the navigation environments may render the ImageNet pre-trained image encoder suboptimal. Therefore, in this paper, we design a set of structure-encoding auxiliary tasks (SEA) that leverage the data in the navigation environments to pre-train and improve the image encoder. Specifically, we design and customize (1) 3D jigsaw, (2) traversability prediction, and (3) instance classification to pre-train the image encoder. Through rigorous ablations, our SEA pre-trained features are shown to better encode structural information of the scenes, which ImageNet pre-trained features fail to properly encode but is crucial for the target navigation task. The SEA pre-trained features can be easily plugged into existing VLN agents without any tuning. For example, on Test-Unseen environments, the VLN agents combined with our SEA pre-trained features achieve absolute success rate improvement of 12% for Speaker-Follower, 5% for Env-Dropout, and 4% for AuxRN.

翻译：在视觉和语言导航(VLN)中,研究人员通常在图像网络上进行预先培训的图像编码器,不对该代理商接受培训或测试的环境进行微调。然而,图像网络的培训图像和导航环境中的视图之间的分布变化可能会使图像网络的预培训图像编码器亚优化。因此,在本文中,我们设计了一套结构编码辅助任务(SEA),将导航环境中的数据用于预培训和改进图像编码器。具体地说,我们设计和定制(1) 3D 吉格锯,(2) 可移动性预测,和(3) 预培训图像编码器的例分类。通过严格的布局,我们的SEA预培训功能可以更好地编码场景的结构信息,而图像网络预培训的功能无法正确编码,但对目标导航任务至关重要。SEA预培训的功能可以很容易地插入现有的VLN代理商,而无需进行任何调整。例如,关于测试-REN环境,为ASEA的绝对比例,为我们A+%的SEL代理商成功率,为我们测试-Unsimal arodu a 4-rodual prial present Apration apration 4N apract pract practres 4N apractorporporporporporporporporporporporporporation ex 4N),为我们A 4N 4N 4N 成功代理商实现了5A 的测试的测试-

0

相关内容

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

专知会员服务

36+阅读 · 2022年3月25日

【Hugging Face】指导文本生成与约束波束搜索🤗Transformers，Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

【Hugging Face】指导文本生成与约束波束搜索🤗Transformers，Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

专知会员服务

22+阅读 · 2022年3月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

新视觉模型下非完整移动机器人同时镇定和跟踪控制研究

国家自然科学基金

0+阅读 · 2015年12月31日

SIRT1对IGF-1诱导的肠上皮细胞间质转化的作用机制及其在肠纤维化中的意义

国家自然科学基金

0+阅读 · 2015年12月31日

糖基化终末产物在糖尿病性角膜上皮病变中的作用及其机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

氟化石墨烯光电探测器研究

国家自然科学基金

1+阅读 · 2013年12月31日

可压缩Navier-Stokes方程和Boltzmann方程解的渐近行为

国家自然科学基金

0+阅读 · 2013年12月31日

大麻素通过AKT,TIMP-1抑制胃癌细胞侵袭转移的作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

孕期高盐饮食与冠状动脉粥样硬化胎源机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于井下全空间TEM的透水预警关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

SphK-1/S1P信号通路介导糖尿病肾脏纤维化的作用与机制

国家自然科学基金

0+阅读 · 2011年12月31日

survivin拮抗细胞衰老的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Which Features are Learned by CodeBert: An Empirical Study of the BERT-based Source Code Representation Learning

Arxiv

0+阅读 · 2023年1月20日

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

Arxiv

0+阅读 · 2023年1月19日

Optimizing Intermediate Representations of Generative Models for Phase Retrieval

Arxiv

0+阅读 · 2023年1月19日

Teacher Forcing Recovers Reward Functions for Text Generation

Arxiv

0+阅读 · 2023年1月18日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Arxiv

19+阅读 · 2020年12月17日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

VIP会员

文章信息

相关主题

ImageNet (数据集)

相关VIP内容

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

专知会员服务

36+阅读 · 2022年3月25日

【Hugging Face】指导文本生成与约束波束搜索🤗Transformers，Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

【Hugging Face】指导文本生成与约束波束搜索🤗Transformers，Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

专知会员服务

22+阅读 · 2022年3月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

相关论文

Which Features are Learned by CodeBert: An Empirical Study of the BERT-based Source Code Representation Learning

Arxiv

0+阅读 · 2023年1月20日

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

Arxiv

0+阅读 · 2023年1月19日

Optimizing Intermediate Representations of Generative Models for Phase Retrieval

Arxiv

0+阅读 · 2023年1月19日

Teacher Forcing Recovers Reward Functions for Text Generation

Arxiv

0+阅读 · 2023年1月18日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Arxiv

19+阅读 · 2020年12月17日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

相关基金

新视觉模型下非完整移动机器人同时镇定和跟踪控制研究

国家自然科学基金

0+阅读 · 2015年12月31日

SIRT1对IGF-1诱导的肠上皮细胞间质转化的作用机制及其在肠纤维化中的意义

国家自然科学基金

0+阅读 · 2015年12月31日

糖基化终末产物在糖尿病性角膜上皮病变中的作用及其机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

氟化石墨烯光电探测器研究

国家自然科学基金

1+阅读 · 2013年12月31日

可压缩Navier-Stokes方程和Boltzmann方程解的渐近行为

国家自然科学基金

0+阅读 · 2013年12月31日

大麻素通过AKT,TIMP-1抑制胃癌细胞侵袭转移的作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

孕期高盐饮食与冠状动脉粥样硬化胎源机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于井下全空间TEM的透水预警关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

SphK-1/S1P信号通路介导糖尿病肾脏纤维化的作用与机制

国家自然科学基金

0+阅读 · 2011年12月31日

survivin拮抗细胞衰老的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员