愿景变异器是参数有效视听学习者 (Vision Transformers are Parameter-Efficient Audio-Visual Learners) - 专知论文

会员服务 ·

0

Vision · 变换 · 学习器 · 潜在 · Performer ·

2022 年 12 月 15 日

Vision Transformers are Parameter-Efficient Audio-Visual Learners

翻译：愿景变异器是参数有效视听学习者

Yan-Bo Lin,Yi-Lin Sung,Jie Lei,Mohit Bansal,Gedas Bertasius

from arxiv, project page: https://genjib.github.io/project_page/LAVISH/

Vision transformers (ViTs) have achieved impressive results on various computer vision tasks in the last several years. In this work, we study the capability of frozen ViTs, pretrained only on visual data, to generalize to audio-visual data without finetuning any of its original parameters. To do so, we propose a latent audio-visual hybrid (LAVISH) adapter that adapts pretrained ViTs to audio-visual tasks by injecting a small number of trainable parameters into every layer of a frozen ViT. To efficiently fuse visual and audio cues, our LAVISH adapter uses a small set of latent tokens, which form an attention bottleneck, thus, eliminating the quadratic cost of standard cross-attention. Compared to the existing modality-specific audio-visual methods, our approach achieves competitive or even better performance on various audio-visual tasks while using fewer tunable parameters and without relying on costly audio pretraining or external audio encoders. Our code is available at https://genjib.github.io/project_page/LAVISH/

翻译：过去几年来,视觉变压器(ViVTs)在各种计算机视觉任务方面取得了令人印象深刻的成果。在这项工作中,我们研究冷冻的ViTs的能力,只对视觉数据进行预先培训,在不对其原有参数作任何微调的情况下,对视听数据进行普及;为此,我们提议了一种潜在的视听混合(LAVISH)适配器,通过将少量可训练参数注入冻结的ViT的每一层,使ViTs适应视听任务。为了有效地结合视觉和声频提示,我们的LAVISS适配器使用一小套潜在标志,形成注意的瓶颈,从而消除标准的交叉注意的二次成本。与现有的特定模式视听方法相比,我们的方法在使用较少的金枪鱼参数的同时,在不依赖昂贵的音频前训练或外部音频编码的情况下,在各种视听任务上取得了竞争性甚至更好的表现。我们的代码可在https://genjib.github.io/production_page/LAVISHSH/。

0

相关内容

Vision

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Prompt Pre-training：迈向更强大的Parameter-Efficient Prompt Tuning

Prompt Pre-training：迈向更强大的Parameter-Efficient Prompt Tuning

PaperWeekly

8+阅读 · 2021年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

面向遮挡条件下的人脸识别方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

内质网Ca2+感受器STIM1调控糖尿病冠状动脉平滑肌细胞表型转化的机制

国家自然科学基金

0+阅读 · 2014年12月31日

HGF/c-Met介导COL1A2在年龄相关性黄斑变性发病中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

AMPK调控内质网应激抵抗COPD气道上皮细胞凋亡的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

p66shc-ROS轴介导猪早期胚胎体外发育阻滞机理的研究

国家自然科学基金

0+阅读 · 2013年12月31日

ING3：原发性肝癌的诊断与治疗新靶点

国家自然科学基金

0+阅读 · 2012年12月31日

高光谱遥感图像解混的稀疏性正则化方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

带约束和参数的多变量逼近的理论与方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Preisach算子的动力电池开路电压滞回效应建模及其多时间尺度在线估计

国家自然科学基金

0+阅读 · 2012年12月31日

Navier-Stokes方程稳定化有限元方法后验误差估计

国家自然科学基金

0+阅读 · 2011年12月31日

Efficient 3D Object Reconstruction using Visual Transformers

Arxiv

0+阅读 · 2023年2月16日

Towards Efficient Visual Adaption via Structural Re-parameterization

Arxiv

0+阅读 · 2023年2月16日

Learning with Noisy labels via Self-supervised Adversarial Noisy Masking

Arxiv

0+阅读 · 2023年2月15日

Learning from Noisy Labels with Decoupled Meta Label Purifier

Arxiv

0+阅读 · 2023年2月15日

Parameter-Efficient Tuning with Special Token Adaptation

Arxiv

0+阅读 · 2023年2月14日

SubTuning: Efficient Finetuning for Multi-Task Learning

Arxiv

0+阅读 · 2023年2月14日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

UNITER: Learning UNiversal Image-TExt Representations

UNITER: Learning UNiversal Image-TExt Representations

Arxiv

23+阅读 · 2019年9月25日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Prompt Pre-training：迈向更强大的Parameter-Efficient Prompt Tuning

Prompt Pre-training：迈向更强大的Parameter-Efficient Prompt Tuning

PaperWeekly

8+阅读 · 2021年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

相关论文

Efficient 3D Object Reconstruction using Visual Transformers

Arxiv

0+阅读 · 2023年2月16日

Towards Efficient Visual Adaption via Structural Re-parameterization

Arxiv

0+阅读 · 2023年2月16日

Learning with Noisy labels via Self-supervised Adversarial Noisy Masking

Arxiv

0+阅读 · 2023年2月15日

Learning from Noisy Labels with Decoupled Meta Label Purifier

Arxiv

0+阅读 · 2023年2月15日

Parameter-Efficient Tuning with Special Token Adaptation

Arxiv

0+阅读 · 2023年2月14日

SubTuning: Efficient Finetuning for Multi-Task Learning

Arxiv

0+阅读 · 2023年2月14日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

UNITER: Learning UNiversal Image-TExt Representations

UNITER: Learning UNiversal Image-TExt Representations

Arxiv

23+阅读 · 2019年9月25日

相关基金

面向遮挡条件下的人脸识别方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

内质网Ca2+感受器STIM1调控糖尿病冠状动脉平滑肌细胞表型转化的机制

国家自然科学基金

0+阅读 · 2014年12月31日

HGF/c-Met介导COL1A2在年龄相关性黄斑变性发病中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

AMPK调控内质网应激抵抗COPD气道上皮细胞凋亡的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

p66shc-ROS轴介导猪早期胚胎体外发育阻滞机理的研究

国家自然科学基金

0+阅读 · 2013年12月31日

ING3：原发性肝癌的诊断与治疗新靶点

国家自然科学基金

0+阅读 · 2012年12月31日

高光谱遥感图像解混的稀疏性正则化方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

带约束和参数的多变量逼近的理论与方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Preisach算子的动力电池开路电压滞回效应建模及其多时间尺度在线估计

国家自然科学基金

0+阅读 · 2012年12月31日

Navier-Stokes方程稳定化有限元方法后验误差估计

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员