CAVL：学习对比和自适应的视觉与语言表示 (CAVL: Learning Contrastive and Adaptive Representations of Vision and Language) - 专知论文

会员服务 ·

0

语言表示 · 自适应 · 微调 · 预训练 · 下游任务 ·

2023 年 4 月 10 日

CAVL: Learning Contrastive and Adaptive Representations of Vision and Language

翻译：CAVL：学习对比和自适应的视觉与语言表示

Shentong Mo,Jingfei Xia,Ihor Markevych

Visual and linguistic pre-training aims to learn vision and language representations together, which can be transferred to visual-linguistic downstream tasks. However, there exists semantic confusion between language and vision during the pre-training stage. Moreover, current pre-trained models tend to take lots of computation resources for fine-tuning when transferred to downstream tasks. In this work, we present a simple but effective approach for learning Contrastive and Adaptive representations of Vision and Language, namely CAVL. Specifically, we introduce a pair-wise contrastive loss to learn alignments between the whole sentence and each image in the same batch during the pre-training process. At the fine-tuning stage, we introduce two lightweight adaptation networks to reduce model parameters and increase training speed for saving computation resources. We evaluate our CAVL on six main downstream tasks, including Visual Question Answering (VQA), Visual Commonsense Reasoning (VCR), Natural Language for Visual Reasoning (NLVR), Region-to-Phrase Grounding (RPG), Text-to-Image Retrieval (TIR), and Zero-shot Text-to-Image Retrieval (ZS-TIR). Compared to baselines, we achieve superior performance and reduce the fine-tuning time by a large margin (in particular, 76.17%). Extensive experiments and ablation studies demonstrate the efficiency of contrastive pre-training and adaptive fine-tuning proposed in our CAVL.

翻译：视觉和语言预训练旨在共同学习视觉和语言表示，并可转移到视觉-语言下游任务中。然而，在预训练阶段存在语言和视觉之间的语义混淆。此外，当前的预训练模型往往需要大量的计算资源进行微调，以便在下游任务中使用。在本文中，我们提出了一种学习视觉和语言对比和自适应表示的简单但有效的方法，即CAVL。具体而言，在预训练过程中，我们引入了一种成对的对比损失，以在相同批次中学习整个句子和每个图像之间的对齐。在微调阶段，我们引入了两个轻量级自适应网络，以减少模型参数并增加训练速度，从而节省计算资源。我们评估了我们的CAVL在六个主要的下游任务上，包括视觉问答（VQA）、视觉通感推理（VCR）、自然语言视觉推理（NLVR）、区域到短语定位（RPG）、文本到图像检索（TIR）和零样本文本到图像检索（ZS-TIR）。与基准线相比，我们实现了更优异的性能，并将微调时间减少了很大程度（特别是76.17%）。大量实验和消融研究证明了我们在CAVL中提出的对比预训练和自适应微调的效率。

0

相关内容

语言表示

语言表示一直是人工智能、计算语言学领域的研究热点。从早期的离散表示到最近的分散式表示，语言表示的主要研究内容包括如何针对不同的语言单位，设计表示语言的数据结构以及和语言的转换机制，即如何将语言转换成计算机内部的数据结构（理解）以及由计算机内部表示转换成语言（生成）。

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

专知会员服务

13+阅读 · 2022年3月12日

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

专知会员服务

43+阅读 · 2022年3月8日

【CVPR2021】一种基于知识蒸馏的弱监督图像文本匹配模型

专知会员服务

35+阅读 · 2021年4月8日

【NeurIPS 2020】视觉和语言表示学习的大规模对抗性训练

【NeurIPS 2020】视觉和语言表示学习的大规模对抗性训练

专知会员服务

15+阅读 · 2020年10月27日

【神经自然语言处理进展：建模，学习，推理】Progress in Neural NLP: Modeling, Learning, and Reasoning

【神经自然语言处理进展：建模，学习，推理】Progress in Neural NLP: Modeling, Learning, and Reasoning

专知会员服务

78+阅读 · 2020年8月13日

【ACL2020-密歇根州立大学】语言和视觉推理的跨模态关联

【ACL2020-密歇根州立大学】语言和视觉推理的跨模态关联

专知会员服务

56+阅读 · 2020年5月14日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

【CVPR2020-加州理工大学Devi Parikh】多任务视觉和语言表示学习

【CVPR2020-加州理工大学Devi Parikh】多任务视觉和语言表示学习

专知会员服务

38+阅读 · 2020年2月25日

【Google论文强烈推荐】ALBERT:基于精简BERT的自我监督学习的语言表示，ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations

【Google论文强烈推荐】ALBERT:基于精简BERT的自我监督学习的语言表示，ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations

专知会员服务

24+阅读 · 2019年12月21日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

COLING 2022 | 将基于梯度相似度的自适应元学习方法用于小样本文本分类

COLING 2022 | 将基于梯度相似度的自适应元学习方法用于小样本文本分类

PaperWeekly

0+阅读 · 2022年9月26日

【CVPR2021】半监督迁移学习的自适应一致性正则化

【CVPR2021】半监督迁移学习的自适应一致性正则化

专知

41+阅读 · 2021年3月7日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

Beclin 1-VPS34复合体对神经细胞内β淀粉样蛋白稳态的调控作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于神经网络的跨语言实体链指研究

国家自然科学基金

4+阅读 · 2015年12月31日

强调与对比影响语篇理解的认知过程及其神经机制

国家自然科学基金

4+阅读 · 2015年12月31日

基于动态图模型与多元结构化在线学习的视觉目标跟踪

国家自然科学基金

0+阅读 · 2013年12月31日

不同环境下帕金森病相关的α-突触核蛋白与synaptobrevin-2相互作用的核磁共振波谱研究

国家自然科学基金

0+阅读 · 2012年12月31日

第二语言学习个体差异的神经机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

视觉系统学习和适应的计算模型

国家自然科学基金

1+阅读 · 2012年12月31日

基于半监督结构化学习的跨语言映射研究

国家自然科学基金

2+阅读 · 2011年12月31日

自噬信号在深II度烧伤创面早期进行性加深中的作用的实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Sparse-Land模型的SAR图像噪声抑制与分割

国家自然科学基金

0+阅读 · 2009年12月31日

Improving Knowledge Distillation via Regularizing Feature Norm and Direction

Arxiv

0+阅读 · 2023年5月26日

Soft Alignment Objectives for Robust Adaptation of Language Generation

Arxiv

0+阅读 · 2023年5月26日

Self-Evolution Learning for Discriminative Language Model Pretraining

Arxiv

1+阅读 · 2023年5月24日

Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model Fine-tuning

Arxiv

0+阅读 · 2023年5月24日

SmartTrim: Adaptive Tokens and Parameters Pruning for Efficient Vision-Language Models

Arxiv

0+阅读 · 2023年5月24日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

Meta Learning for Natural Language Processing: A Survey

Meta Learning for Natural Language Processing: A Survey

Arxiv

14+阅读 · 2022年5月3日

Visual Attention Methods in Deep Learning: An In-Depth Survey

Arxiv

44+阅读 · 2022年4月16日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Arxiv

13+阅读 · 2021年4月7日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

专知会员服务

13+阅读 · 2022年3月12日

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

专知会员服务

43+阅读 · 2022年3月8日

【CVPR2021】一种基于知识蒸馏的弱监督图像文本匹配模型

专知会员服务

35+阅读 · 2021年4月8日

【NeurIPS 2020】视觉和语言表示学习的大规模对抗性训练

【NeurIPS 2020】视觉和语言表示学习的大规模对抗性训练

专知会员服务

15+阅读 · 2020年10月27日

【神经自然语言处理进展：建模，学习，推理】Progress in Neural NLP: Modeling, Learning, and Reasoning

【神经自然语言处理进展：建模，学习，推理】Progress in Neural NLP: Modeling, Learning, and Reasoning

专知会员服务

78+阅读 · 2020年8月13日

【ACL2020-密歇根州立大学】语言和视觉推理的跨模态关联

【ACL2020-密歇根州立大学】语言和视觉推理的跨模态关联

专知会员服务

56+阅读 · 2020年5月14日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

【CVPR2020-加州理工大学Devi Parikh】多任务视觉和语言表示学习

【CVPR2020-加州理工大学Devi Parikh】多任务视觉和语言表示学习

专知会员服务

38+阅读 · 2020年2月25日

【Google论文强烈推荐】ALBERT:基于精简BERT的自我监督学习的语言表示，ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations

【Google论文强烈推荐】ALBERT:基于精简BERT的自我监督学习的语言表示，ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations

专知会员服务

24+阅读 · 2019年12月21日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

中文版 | 人工智能在兵棋推演中的应用探索

《定位、导航与授时（PNT）未来趋势研究报告》76页最新报告

中文版 | 美空军拟为未来无人机列装机密型AIM-260空对空导弹

《多模态大语言模型在基于模型的系统工程中的视觉问答能力探索》最新报告

相关资讯

COLING 2022 | 将基于梯度相似度的自适应元学习方法用于小样本文本分类

COLING 2022 | 将基于梯度相似度的自适应元学习方法用于小样本文本分类

PaperWeekly

0+阅读 · 2022年9月26日

【CVPR2021】半监督迁移学习的自适应一致性正则化

【CVPR2021】半监督迁移学习的自适应一致性正则化

专知

41+阅读 · 2021年3月7日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

相关论文

Improving Knowledge Distillation via Regularizing Feature Norm and Direction

Arxiv

0+阅读 · 2023年5月26日

Soft Alignment Objectives for Robust Adaptation of Language Generation

Arxiv

0+阅读 · 2023年5月26日

Self-Evolution Learning for Discriminative Language Model Pretraining

Arxiv

1+阅读 · 2023年5月24日

Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model Fine-tuning

Arxiv

0+阅读 · 2023年5月24日

SmartTrim: Adaptive Tokens and Parameters Pruning for Efficient Vision-Language Models

Arxiv

0+阅读 · 2023年5月24日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

Meta Learning for Natural Language Processing: A Survey

Meta Learning for Natural Language Processing: A Survey

Arxiv

14+阅读 · 2022年5月3日

Visual Attention Methods in Deep Learning: An In-Depth Survey

Arxiv

44+阅读 · 2022年4月16日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Arxiv

13+阅读 · 2021年4月7日

相关基金

Beclin 1-VPS34复合体对神经细胞内β淀粉样蛋白稳态的调控作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于神经网络的跨语言实体链指研究

国家自然科学基金

4+阅读 · 2015年12月31日

强调与对比影响语篇理解的认知过程及其神经机制

国家自然科学基金

4+阅读 · 2015年12月31日

基于动态图模型与多元结构化在线学习的视觉目标跟踪

国家自然科学基金

0+阅读 · 2013年12月31日

不同环境下帕金森病相关的α-突触核蛋白与synaptobrevin-2相互作用的核磁共振波谱研究

国家自然科学基金

0+阅读 · 2012年12月31日

第二语言学习个体差异的神经机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

视觉系统学习和适应的计算模型

国家自然科学基金

1+阅读 · 2012年12月31日

基于半监督结构化学习的跨语言映射研究

国家自然科学基金

2+阅读 · 2011年12月31日

自噬信号在深II度烧伤创面早期进行性加深中的作用的实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Sparse-Land模型的SAR图像噪声抑制与分割

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员