VAULT:通过推进深语言代表制,扩大愿景和语言变形器 (VAuLT: Augmenting the Vision-and-Language Transformer with the Propagation of Deep Language Representations) - 专知论文

会员服务 ·

0

语言表示 · 变换 · Extensibility · 推断 · Notability ·

2022 年 8 月 18 日

VAuLT: Augmenting the Vision-and-Language Transformer with the Propagation of Deep Language Representations

翻译：VAULT:通过推进深语言代表制,扩大愿景和语言变形器

Georgios Chochlakis,Tejas Srinivasan,Jesse Thomason,Shrikanth Narayanan

from arxiv, 10 pages, 1 figure

We propose the Vision-and-Augmented-Language Transformer (VAuLT). VAuLT is an extension of the popular Vision-and-Language Transformer (ViLT), and improves performance on vision-and-language tasks that involve more complex text inputs than image captions while having minimal impact on training and inference efficiency. ViLT, importantly, enables efficient training and inference in vision-and-language tasks, achieved by using a shallow image encoder. However, it is pretrained on captioning and similar datasets, where the language input is simple, literal, and descriptive, therefore lacking linguistic diversity. So, when working with multimedia data in the wild, such as multimodal social media data (in our work, Twitter), there is a notable shift from captioning language data, as well as diversity of tasks, and we indeed find evidence that the language capacity of ViLT is lacking instead. The key insight of VAuLT is to propagate the output representations of a large language model like BERT to the language input of ViLT. We show that such a strategy significantly improves over ViLT on vision-and-language tasks involving richer language inputs and affective constructs, such as TWITTER-2015, TWITTER-2017, MVSA-Single and MVSA-Multiple, but lags behind pure reasoning tasks such as the Bloomberg Twitter Text-Image Relationship dataset. We have released the code for all our experiments at https://github.com/gchochla/VAuLT.

翻译：我们提议采用视觉和放大语言变换器(VauLT)。 VAULT是广受欢迎的视觉和语言变换器(ViLT)的延伸,它也是广受欢迎的视觉和语言变换器(ViLT)的延伸,它改进了视觉和语言工作的业绩,涉及比图像字幕更复杂的文字投入而不是图像字幕说明,同时对培训和推断效率的影响最小。ViLT(ViLT)很重要,它使得通过使用浅浅图像变色器在视觉和语言任务方面能够进行有效的培训和推断。然而,VAULT在字幕和类似的数据集方面已经预先掌握了语言输入简单、字型和描述性,因此缺乏语言的实验性。因此,当与野生多媒体数据,例如多式社会媒体数据(在我们的工作中,Twitter)一起工作时,它明显地改变了语言数据说明,以及任务的多样性。 ViLT(BERT)LT)的关键见解是将大型语言变色模型的输出式表述方式展示给 ViltLTLT(BERT)的语言输入。我们显示这样的战略大大改进了S-LTVIT(TLTLTLT)在视觉和语言变色LVITLT-LT-LT-LVA的背后,例如VLVLT-LT-LT-LT-LT-LT-LT-LT-LT-LUT)的推式的背后的变相和推制式任务,例如V-LT-LVIT-LT-LT-LT-LUT-LT-LUT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-LT-L

0

相关内容

语言表示

语言表示一直是人工智能、计算语言学领域的研究热点。从早期的离散表示到最近的分散式表示，语言表示的主要研究内容包括如何针对不同的语言单位，设计表示语言的数据结构以及和语言的转换机制，即如何将语言转换成计算机内部的数据结构（理解）以及由计算机内部表示转换成语言（生成）。

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【PAISS 2021 教程】概率散度与生成式模型，92页ppt

【PAISS 2021 教程】概率散度与生成式模型，92页ppt

专知会员服务

34+阅读 · 2021年11月30日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

TM2+:II-VI@LnF3纳米晶硫系玻璃复合材料的制备及中红外发光性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

稀土元素对FeGa合金性能影响机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

钙钛矿结构Cr基氧化物单晶的制备和磁电效应研究

国家自然科学基金

0+阅读 · 2013年12月31日

对称性破缺条件下耦合系统chimera态的特性研究

国家自然科学基金

0+阅读 · 2013年12月31日

多功能化药物和siRNA共载体系的制备及抗肿瘤研究

国家自然科学基金

0+阅读 · 2013年12月31日

社交网络信息传播与演化机理研究

国家自然科学基金

6+阅读 · 2012年12月31日

钙钛矿铁电/铁磁复合多铁性隧道结的外延制备和磁电效应

国家自然科学基金

0+阅读 · 2011年12月31日

氮掺杂氧化物制备新型稀释磁性半导体

国家自然科学基金

0+阅读 · 2009年12月31日

基于图模型的动态立体场景检索研究

国家自然科学基金

0+阅读 · 2009年12月31日

铋基钙钛矿无铅压电陶瓷的性能调控和物理机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

Iterative Vision-and-Language Navigation

Arxiv

0+阅读 · 2022年10月6日

Memory in humans and deep language models: Linking hypotheses for model augmentation

Arxiv

0+阅读 · 2022年10月4日

ASIF: Coupled Data Turns Unimodal Models to Multimodal Without Training

Arxiv

0+阅读 · 2022年10月4日

Understanding Prior Bias and Choice Paralysis in Transformer-based Language Representation Models through Four Experimental Probes

Arxiv

0+阅读 · 2022年10月3日

Context-Tuning: Learning Contextualized Prompts for Natural Language Generation

Arxiv

0+阅读 · 2022年10月3日

Relative representations enable zero-shot latent space communication

Arxiv

0+阅读 · 2022年9月30日

Multimodal Learning with Transformers: A Survey

Arxiv

69+阅读 · 2022年6月13日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【PAISS 2021 教程】概率散度与生成式模型，92页ppt

【PAISS 2021 教程】概率散度与生成式模型，92页ppt

专知会员服务

34+阅读 · 2021年11月30日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Iterative Vision-and-Language Navigation

Arxiv

0+阅读 · 2022年10月6日

Memory in humans and deep language models: Linking hypotheses for model augmentation

Arxiv

0+阅读 · 2022年10月4日

ASIF: Coupled Data Turns Unimodal Models to Multimodal Without Training

Arxiv

0+阅读 · 2022年10月4日

Understanding Prior Bias and Choice Paralysis in Transformer-based Language Representation Models through Four Experimental Probes

Arxiv

0+阅读 · 2022年10月3日

Context-Tuning: Learning Contextualized Prompts for Natural Language Generation

Arxiv

0+阅读 · 2022年10月3日

Relative representations enable zero-shot latent space communication

Arxiv

0+阅读 · 2022年9月30日

Multimodal Learning with Transformers: A Survey

Arxiv

69+阅读 · 2022年6月13日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

相关基金

TM2+:II-VI@LnF3纳米晶硫系玻璃复合材料的制备及中红外发光性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

稀土元素对FeGa合金性能影响机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

钙钛矿结构Cr基氧化物单晶的制备和磁电效应研究

国家自然科学基金

0+阅读 · 2013年12月31日

对称性破缺条件下耦合系统chimera态的特性研究

国家自然科学基金

0+阅读 · 2013年12月31日

多功能化药物和siRNA共载体系的制备及抗肿瘤研究

国家自然科学基金

0+阅读 · 2013年12月31日

社交网络信息传播与演化机理研究

国家自然科学基金

6+阅读 · 2012年12月31日

钙钛矿铁电/铁磁复合多铁性隧道结的外延制备和磁电效应

国家自然科学基金

0+阅读 · 2011年12月31日

氮掺杂氧化物制备新型稀释磁性半导体

国家自然科学基金

0+阅读 · 2009年12月31日

基于图模型的动态立体场景检索研究

国家自然科学基金

0+阅读 · 2009年12月31日

铋基钙钛矿无铅压电陶瓷的性能调控和物理机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员