ST-Adapter: 参数-有效图像到视频传输学习 (ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning)

Capitalizing on large pre-trained models for various downstream tasks of interest have recently emerged with promising performance. Due to the ever-growing model size, the standard full fine-tuning based task adaptation strategy becomes prohibitively costly in terms of model training and storage. This has led to a new research direction in parameter-efficient transfer learning. However, existing attempts typically focus on downstream tasks from the same modality (e.g., image understanding) of the pre-trained model. This creates a limit because in some specific modalities, (e.g., video understanding) such a strong pre-trained model with sufficient knowledge is less or not available. In this work, we investigate such a novel cross-modality transfer learning setting, namely parameter-efficient image-to-video transfer learning. To solve this problem, we propose a new Spatio-Temporal Adapter (ST-Adapter) for parameter-efficient fine-tuning per video task. With a built-in spatio-temporal reasoning capability in a compact design, ST-Adapter enables a pre-trained image model without temporal knowledge to reason about dynamic video content at a small (~8%) per-task parameter cost, requiring approximately 20 times fewer updated parameters compared to previous work. Extensive experiments on video action recognition tasks show that our ST-Adapter can match or even outperform the strong full fine-tuning strategy and state-of-the-art video models, whilst enjoying the advantage of parameter efficiency. The code and model are available at https://github.com/linziyi96/st-adapter

翻译：将大型培训前模型用于各种下游任务,最近出现了有希望的成绩;由于模型规模不断扩大,标准全面微调基于任务适应战略在模型培训和存储方面成本过高,这导致在参数效率转移学习方面出现了新的研究方向。然而,现有的尝试通常侧重于与培训前模型相同模式(如图像理解)的下游任务。这造成了一个限度,因为在某些特定模式(如视频理解)中,这种具备足够知识的强力预先培训模型越来越少或根本没有。在这项工作中,我们调查这种新型跨模式转移学习设置,即参数效率图像到视频传输学习学习。为了解决这个问题,我们提议一个新的Spatio-时间调整器(ST-Adapter),用于对每个视频任务进行参数效率微调。由于在一些特定模式(如视频理解)中,St-Adapter能够使事先培训的图像模型模型更小,而没有时间上关于动态视频内容的知识。A-8-toim 更新的校正校正任务,需要小的Starial-ta-ta-lax mex imal lax a lagistral ex ex lavial ex ex lavial ex lagistral ex ex lavial ex lagistral ex ex fortistrual ex laview ex fal ex ex ex fal ex fal ex ex fal ex fal ex fal expaltistrisal ex ex exx expal exitaltistrolvioltraxx ex ex ex exx exx exx exx exf exx semstr semstr sal exfal exfal exal exal exal exal exal exal exal exal exal exactal exal exstrolaldal exal exal exal exactal exal exactal exactal exactal fal ex ex ex ex exal exal exal exal 20xxxxxx a exxal ex a ex

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日