TVLT:无文字视觉语言变换器 (TVLT: Textless Vision-Language Transformer) - 专知论文

会员服务 ·

0

变换 · 多峰值 · Learning · 语音识别 · Continuity ·

2022 年 9 月 28 日

TVLT: Textless Vision-Language Transformer

翻译：TVLT:无文字视觉语言变换器

Zineng Tang,Jaemin Cho,Yixin Nie,Mohit Bansal

from arxiv, NeurIPS 2022 (21 pages; the first three authors contributed equally)

In this work, we present the Textless Vision-Language Transformer (TVLT), where homogeneous transformer blocks take raw visual and audio inputs for vision-and-language representation learning with minimal modality-specific design, and do not use text-specific modules such as tokenization or automatic speech recognition (ASR). TVLT is trained by reconstructing masked patches of continuous video frames and audio spectrograms (masked autoencoding) and contrastive modeling to align video and audio. TVLT attains performance comparable to its text-based counterpart, on various multimodal tasks, such as visual question answering, image retrieval, video retrieval, and multimodal sentiment analysis, with 28x faster inference speed and only 1/3 of the parameters. Our findings suggest the possibility of learning compact and efficient visual-linguistic representations from low-level visual and audio signals without assuming the prior existence of text. Our code and checkpoints are available at: https://github.com/zinengtang/TVLT

翻译：在这项工作中,我们介绍了“无文字视觉语言变换器”(TVLT),在这种变压器中,同质变压器块将原始的视觉和音频投入用于视觉和语言表现学习,并采用最低限度模式的特定设计,不使用象征性化或自动语音识别等文本特定模块。 TVLT通过重建连续视频框架和声频谱图(制成自动编码)的蒙面罩和对比模型来进行匹配视频和音频。 TVLT在视觉回答、图像检索、视频检索和多式情绪分析等多种多式联运任务上取得了与文本对等相似的性能,采用28x更快的推断速度,只有1/3的参数。我们的调查结果表明,在不假定有文字存在的情况下,可以从低级别的视觉和音频信号中学习紧凑和高效的视觉语言表达方式。我们的代码和检查站可以在以下网址上查到:https://github.com/zenngtang/TVLTT。

0

相关内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

72+阅读 · 2022年3月15日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

274+阅读 · 2020年11月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

59+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

CVPR 2020 论文开源项目合集

专知会员服务

109+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

30+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

2+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

39+阅读 · 2019年6月9日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

湘西寒武纪奥斯坦型保存化石的研究

国家自然科学基金

0+阅读 · 2015年12月31日

热-机械疲劳载荷下抗高温材料表面冷却孔的变形研究

国家自然科学基金

0+阅读 · 2013年12月31日

多标记数据的粒计算理论与算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

248nm紫外短波长超强短脉冲激光驱动的质子加速研究

国家自然科学基金

0+阅读 · 2012年12月31日

α-酮己二酰-7-氨基头孢烷酸酰化酶的定向进化研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于多金属氧酸盐的MOFs功能化及性能

国家自然科学基金

0+阅读 · 2011年12月31日

Drp-1基因在内质网应激诱导胰岛β32454;胞凋亡中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

微生物降解多环芳烃的代谢物分析及其共代谢机理

国家自然科学基金

0+阅读 · 2009年12月31日

多酸-二氧化钛复合膜的制备及其光催化降解水中有机污染物研究

国家自然科学基金

0+阅读 · 2008年12月31日

航空发动机疲劳寿命预测及故障诊断研究

国家自然科学基金

3+阅读 · 2008年12月31日

The Lottery Ticket Hypothesis for Vision Transformers

Arxiv

0+阅读 · 2022年11月2日

Reduce, Reuse, Recycle: Improving Training Efficiency with Distillation

Arxiv

0+阅读 · 2022年11月1日

Vision-Language Pre-training: Basics, Recent Advances, and Future Trends

Arxiv

28+阅读 · 2022年10月17日

A Survey on Vision Transformer

Arxiv

17+阅读 · 2022年2月23日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

21+阅读 · 2021年8月12日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

72+阅读 · 2022年3月15日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

274+阅读 · 2020年11月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

59+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

CVPR 2020 论文开源项目合集

专知会员服务

109+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

30+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

热门VIP内容

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

2+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

39+阅读 · 2019年6月9日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

相关论文

The Lottery Ticket Hypothesis for Vision Transformers

Arxiv

0+阅读 · 2022年11月2日

Reduce, Reuse, Recycle: Improving Training Efficiency with Distillation

Arxiv

0+阅读 · 2022年11月1日

Vision-Language Pre-training: Basics, Recent Advances, and Future Trends

Arxiv

28+阅读 · 2022年10月17日

A Survey on Vision Transformer

Arxiv

17+阅读 · 2022年2月23日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

21+阅读 · 2021年8月12日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

相关基金

湘西寒武纪奥斯坦型保存化石的研究

国家自然科学基金

0+阅读 · 2015年12月31日

热-机械疲劳载荷下抗高温材料表面冷却孔的变形研究

国家自然科学基金

0+阅读 · 2013年12月31日

多标记数据的粒计算理论与算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

248nm紫外短波长超强短脉冲激光驱动的质子加速研究

国家自然科学基金

0+阅读 · 2012年12月31日

α-酮己二酰-7-氨基头孢烷酸酰化酶的定向进化研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于多金属氧酸盐的MOFs功能化及性能

国家自然科学基金

0+阅读 · 2011年12月31日

Drp-1基因在内质网应激诱导胰岛β32454;胞凋亡中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

微生物降解多环芳烃的代谢物分析及其共代谢机理

国家自然科学基金

0+阅读 · 2009年12月31日

多酸-二氧化钛复合膜的制备及其光催化降解水中有机污染物研究

国家自然科学基金

0+阅读 · 2008年12月31日

航空发动机疲劳寿命预测及故障诊断研究

国家自然科学基金

3+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员