任务残差法用于微调视觉语言模型 (Task Residual for Tuning Vision-Language Models) - 专知论文

会员服务 ·

0

微调 · 视觉语言模型 · 知识 · tuning · 分类器 ·

2023 年 3 月 24 日

Task Residual for Tuning Vision-Language Models

翻译：任务残差法用于微调视觉语言模型

Tao Yu,Zhihe Lu,Xin Jin,Zhibo Chen,Xinchao Wang

from arxiv, Accepted to CVPR 2023

Large-scale vision-language models (VLMs) pre-trained on billion-level data have learned general visual representations and broad visual concepts. In principle, the well-learned knowledge structure of the VLMs should be inherited appropriately when being transferred to downstream tasks with limited data. However, most existing efficient transfer learning (ETL) approaches for VLMs either damage or are excessively biased towards the prior knowledge, e.g., prompt tuning (PT) discards the pre-trained text-based classifier and builds a new one while adapter-style tuning (AT) fully relies on the pre-trained features. To address this, we propose a new efficient tuning approach for VLMs named Task Residual Tuning (TaskRes), which performs directly on the text-based classifier and explicitly decouples the prior knowledge of the pre-trained models and new knowledge regarding a target task. Specifically, TaskRes keeps the original classifier weights from the VLMs frozen and obtains a new classifier for the target task by tuning a set of prior-independent parameters as a residual to the original one, which enables reliable prior knowledge preservation and flexible task-specific knowledge exploration. The proposed TaskRes is simple yet effective, which significantly outperforms previous ETL methods (e.g., PT and AT) on 11 benchmark datasets while requiring minimal effort for the implementation. Our code is available at https://github.com/geekyutao/TaskRes.

翻译：大规模的视觉语言模型（VLM）预训练于十亿级别的数据上，已经学会了广泛的图像概念和通用的视觉表示。理论上，当这些VLM使用于具有限数据量的下游任务时，应当适当地继承已学习的知识结构。但是，现存的VLM有效迁移学习方法，如提示微调法（Prompt Tuning）和适配器式微调法（Adapter-style Tuning），要么损坏先前知识要么过度偏置，它们对于先前的知识结构无法很好地继承。为了解决这个问题，我们提出了一种新的VLM有效微调方法，被命名为任务残差法（Task Residual Tuning，TaskRes）。它直接用于文本分类器，并显式地将预训练模型的先前知识和目标任务的新知识解耦。具体而言，TaskRes将VLM原分类器的权重冻结，通过调整预独立参数的残差来得到适用于目标任务的新分类器，从而实现可靠的知识传承和灵活的任务特定知识探索。TaskRes方法简单而有效，极大地提高了11个基准数据集上的微调性能，而且实现简单。我们的代码可以在 https://github.com/geekyutao/TaskRes 上获得。

0

相关内容

预训练模型如何用在视觉任务？南洋理工最新《视觉语言模型》综述，全面概述视觉语言模型方法体系

预训练模型如何用在视觉任务？南洋理工最新《视觉语言模型》综述，全面概述视觉语言模型方法体系

专知会员服务

53+阅读 · 2023年4月4日

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

专知会员服务

32+阅读 · 2022年3月12日

中科院自动化所徐波团队最新《视觉-语言预训练》综述

中科院自动化所徐波团队最新《视觉-语言预训练》综述

专知会员服务

67+阅读 · 2022年2月23日

【Manning新书】迁移学习自然语言处理，266页pdf，Transfer Learning for NLP

【Manning新书】迁移学习自然语言处理，266页pdf，Transfer Learning for NLP

专知会员服务

137+阅读 · 2021年11月6日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

专知会员服务

60+阅读 · 2019年12月24日

【AAAI2020接受论文】利用图卷积网络将知识注入文本任务，Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks

【AAAI2020接受论文】利用图卷积网络将知识注入文本任务，Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks

专知会员服务

45+阅读 · 2019年11月11日

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

专知会员服务

14+阅读 · 2019年11月11日

IJCAI 2022 | 使用陈述句进行视觉问答的Prompt Tuning

IJCAI 2022 | 使用陈述句进行视觉问答的Prompt Tuning

PaperWeekly

3+阅读 · 2022年9月21日

打开模型Zero-Shot新范式：Instruction Tuning

打开模型Zero-Shot新范式：Instruction Tuning

PaperWeekly

2+阅读 · 2022年8月25日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

【Github】nlp-tutorial：TensorFlow 和 PyTorch 实现各种NLP模型

【Github】nlp-tutorial：TensorFlow 和 PyTorch 实现各种NLP模型

AINLP

14+阅读 · 2019年9月4日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

基于单语语料的无监督统计机器翻译模型研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于Fermi-LAT和AMS-02的暗物质理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于多任务稀疏特征学习的海量图像理解方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于主干成分的句法统计机器翻译模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向海量数据语义标注众包的任务管理方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于海量语料自然标注信息的汉语自然语块分析

国家自然科学基金

0+阅读 · 2013年12月31日

高分辨率极化SAR图像场景分类研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向综合力学环境预测的回归多任务学习研究

国家自然科学基金

0+阅读 · 2012年12月31日

利用激光解离质谱成像技术研究次级代谢产物在模型植物中的组成和分布

国家自然科学基金

0+阅读 · 2011年12月31日

基于Sparse-Land模型的SAR图像噪声抑制与分割

国家自然科学基金

0+阅读 · 2009年12月31日

Pruning Pre-trained Language Models Without Fine-Tuning

Arxiv

0+阅读 · 2023年5月16日

Masked Autoencoding Does Not Help Natural Language Supervision at Scale

Arxiv

0+阅读 · 2023年5月15日

Improved baselines for vision-language pre-training

Arxiv

0+阅读 · 2023年5月15日

Debiasing Vision-Language Models via Biased Prompts

Arxiv

0+阅读 · 2023年5月15日

Versatile Audio-Visual Learning for Handling Single and Multi Modalities in Emotion Regression and Classification Tasks

Arxiv

0+阅读 · 2023年5月12日

Exploring Zero and Few-shot Techniques for Intent Classification

Arxiv

0+阅读 · 2023年5月11日

Deep Class-Incremental Learning: A Survey

Arxiv

13+阅读 · 2023年2月7日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

VIP会员

文章信息

相关主题

视觉语言模型

相关VIP内容

预训练模型如何用在视觉任务？南洋理工最新《视觉语言模型》综述，全面概述视觉语言模型方法体系

预训练模型如何用在视觉任务？南洋理工最新《视觉语言模型》综述，全面概述视觉语言模型方法体系

专知会员服务

53+阅读 · 2023年4月4日

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

专知会员服务

32+阅读 · 2022年3月12日

中科院自动化所徐波团队最新《视觉-语言预训练》综述

中科院自动化所徐波团队最新《视觉-语言预训练》综述

专知会员服务

67+阅读 · 2022年2月23日

【Manning新书】迁移学习自然语言处理，266页pdf，Transfer Learning for NLP

【Manning新书】迁移学习自然语言处理，266页pdf，Transfer Learning for NLP

专知会员服务

137+阅读 · 2021年11月6日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

专知会员服务

60+阅读 · 2019年12月24日

【AAAI2020接受论文】利用图卷积网络将知识注入文本任务，Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks

【AAAI2020接受论文】利用图卷积网络将知识注入文本任务，Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks

专知会员服务

45+阅读 · 2019年11月11日

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

专知会员服务

14+阅读 · 2019年11月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

IJCAI 2022 | 使用陈述句进行视觉问答的Prompt Tuning

IJCAI 2022 | 使用陈述句进行视觉问答的Prompt Tuning

PaperWeekly

3+阅读 · 2022年9月21日

打开模型Zero-Shot新范式：Instruction Tuning

打开模型Zero-Shot新范式：Instruction Tuning

PaperWeekly

2+阅读 · 2022年8月25日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

【Github】nlp-tutorial：TensorFlow 和 PyTorch 实现各种NLP模型

【Github】nlp-tutorial：TensorFlow 和 PyTorch 实现各种NLP模型

AINLP

14+阅读 · 2019年9月4日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Pruning Pre-trained Language Models Without Fine-Tuning

Arxiv

0+阅读 · 2023年5月16日

Masked Autoencoding Does Not Help Natural Language Supervision at Scale

Arxiv

0+阅读 · 2023年5月15日

Improved baselines for vision-language pre-training

Arxiv

0+阅读 · 2023年5月15日

Debiasing Vision-Language Models via Biased Prompts

Arxiv

0+阅读 · 2023年5月15日

Versatile Audio-Visual Learning for Handling Single and Multi Modalities in Emotion Regression and Classification Tasks

Arxiv

0+阅读 · 2023年5月12日

Exploring Zero and Few-shot Techniques for Intent Classification

Arxiv

0+阅读 · 2023年5月11日

Deep Class-Incremental Learning: A Survey

Arxiv

13+阅读 · 2023年2月7日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

相关基金

基于单语语料的无监督统计机器翻译模型研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于Fermi-LAT和AMS-02的暗物质理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于多任务稀疏特征学习的海量图像理解方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于主干成分的句法统计机器翻译模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向海量数据语义标注众包的任务管理方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于海量语料自然标注信息的汉语自然语块分析

国家自然科学基金

0+阅读 · 2013年12月31日

高分辨率极化SAR图像场景分类研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向综合力学环境预测的回归多任务学习研究

国家自然科学基金

0+阅读 · 2012年12月31日

利用激光解离质谱成像技术研究次级代谢产物在模型植物中的组成和分布

国家自然科学基金

0+阅读 · 2011年12月31日

基于Sparse-Land模型的SAR图像噪声抑制与分割

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员