从模式 -- -- 共同冲突语言图像培训前的学习视觉代表 (Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training) - 专知论文

会员服务 ·

0

contrastive · Learning · 模态 · Vision · 分离的 ·

2022 年 7 月 26 日

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training

翻译：从模式 -- -- 共同冲突语言图像培训前的学习视觉代表

Haoxuan You,Luowei Zhou,Bin Xiao,Noel Codella,Yu Cheng,Ruochen Xu,Shih-Fu Chang,Lu Yuan

from arxiv, Accepted by ECCV 2022, 22 pages, 4 figures

Large-scale multi-modal contrastive pre-training has demonstrated great utility to learn transferable features for a range of downstream tasks by mapping multiple modalities into a shared embedding space. Typically, this has employed separate encoders for each modality. However, recent work suggests that transformers can support learning across multiple modalities and allow knowledge sharing. Inspired by this, we investigate a variety of Modality-Shared Contrastive Language-Image Pre-training (MS-CLIP) frameworks. More specifically, we question how many parameters of a transformer model can be shared across modalities during contrastive pre-training, and rigorously examine architectural design choices that position the proportion of parameters shared along a spectrum. In studied conditions, we observe that a mostly unified encoder for vision and language signals outperforms all other variations that separate more parameters. Additionally, we find that light-weight modality-specific parallel modules further improve performance. Experimental results show that the proposed MS-CLIP approach outperforms vanilla CLIP by up to 13\% relative in zero-shot ImageNet classification (pre-trained on YFCC-100M), while simultaneously supporting a reduction of parameters. In addition, our approach outperforms vanilla CLIP by 1.6 points in linear probing on a collection of 24 downstream vision tasks. Furthermore, we discover that sharing parameters leads to semantic concepts from different modalities being encoded more closely in the embedding space, facilitating the transferring of common semantic structure (e.g., attention patterns) from language to vision. Code is available at \href{https://github.com/Hxyou/MSCLIP}{URL}.

翻译：大型多式对比培训前的大规模多式培训显示,通过将多种模式映射成共同嵌入空间,学习一系列下游任务的可转让特性非常有用。通常, 这为每种模式采用了不同的编码器。然而, 最近的工作表明, 变压器可以支持跨多种模式的学习, 并允许知识共享。受此启发, 我们调查了多种模式- 共享对比语言预培训( MS- CLIP) 框架。更具体地说, 我们质疑在对比性培训前, 并严格审查将参数比例定位于一个频谱的建筑设计选择。在所研究的环境下, 我们观察到, 视觉和语言信号的多数统一的编码器可以支持不同模式的所有其他变异。此外, 我们发现, 轻质模式的平行模块进一步提高了性能。实验结果表明, 拟议的MS- CLIP 方法在零版图像网络分类中比VanLIP 高13 相对化了 Vanqolgi 。 ( 在YFC- 100M 上, 之前对通用参数的比例) 设计选择, 并同时支持将常规 CLIP 格式格式的常规化格式化。

0

相关内容

contrastive

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

基于高空间分辨电子显微学In2-xGaxO3(ZnO)m缺陷分析

国家自然科学基金

0+阅读 · 2015年12月31日

浅海声场时频特性与warping变换方法应用研究

国家自然科学基金

0+阅读 · 2014年12月31日

二维层状纳米材料的谷极化与谷电子/自旋弛豫动力学研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于Split Bregman方法的全局凸快速图像分割模型的研究

国家自然科学基金

1+阅读 · 2013年12月31日

CLU,CR1，PICALM基因多态性及相关因素与内蒙古蒙、汉族阿尔茨海默病人群的病例-对照研究

国家自然科学基金

0+阅读 · 2012年12月31日

针尖石墨烯纳米场效应晶体管生物传感器的研究

国家自然科学基金

0+阅读 · 2012年12月31日

铁基超导体角分辨光电子能谱及电子拉曼光谱的研究

国家自然科学基金

0+阅读 · 2012年12月31日

射频原子波导的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

乙型肝炎病毒与宿主细胞相互作用的三维组学分析

国家自然科学基金

0+阅读 · 2009年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

Class-Incremental Continual Learning into the eXtended DER-verse

Arxiv

0+阅读 · 2022年9月19日

Scale Attention for Learning Deep Face Representation: A Study Against Visual Scale Variation

Arxiv

0+阅读 · 2022年9月19日

Enhance the Visual Representation via Discrete Adversarial Training

Arxiv

0+阅读 · 2022年9月16日

Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models

Arxiv

0+阅读 · 2022年9月15日

Understanding Robust Learning through the Lens of Representation Similarities

Arxiv

0+阅读 · 2022年9月15日

Delving into Inter-Image Invariance for Unsupervised Visual Representations

Arxiv

0+阅读 · 2022年9月15日

Exploring Visual Interpretability for Contrastive Language-Image Pre-training

Arxiv

0+阅读 · 2022年9月15日

Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark

Arxiv

14+阅读 · 2021年11月11日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《毁灭算法：解析以色列在加沙的AI军事行动》

【COLT 2025最新教程】语言生成

以机器速度锁定目标：人工智能的能力与局限

【ICML2025】通过在线世界模型规划的持续强化学习

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

相关论文

Class-Incremental Continual Learning into the eXtended DER-verse

Arxiv

0+阅读 · 2022年9月19日

Scale Attention for Learning Deep Face Representation: A Study Against Visual Scale Variation

Arxiv

0+阅读 · 2022年9月19日

Enhance the Visual Representation via Discrete Adversarial Training

Arxiv

0+阅读 · 2022年9月16日

Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models

Arxiv

0+阅读 · 2022年9月15日

Understanding Robust Learning through the Lens of Representation Similarities

Arxiv

0+阅读 · 2022年9月15日

Delving into Inter-Image Invariance for Unsupervised Visual Representations

Arxiv

0+阅读 · 2022年9月15日

Exploring Visual Interpretability for Contrastive Language-Image Pre-training

Arxiv

0+阅读 · 2022年9月15日

Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark

Arxiv

14+阅读 · 2021年11月11日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

相关基金

基于高空间分辨电子显微学In2-xGaxO3(ZnO)m缺陷分析

国家自然科学基金

0+阅读 · 2015年12月31日

浅海声场时频特性与warping变换方法应用研究

国家自然科学基金

0+阅读 · 2014年12月31日

二维层状纳米材料的谷极化与谷电子/自旋弛豫动力学研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于Split Bregman方法的全局凸快速图像分割模型的研究

国家自然科学基金

1+阅读 · 2013年12月31日

CLU,CR1，PICALM基因多态性及相关因素与内蒙古蒙、汉族阿尔茨海默病人群的病例-对照研究

国家自然科学基金

0+阅读 · 2012年12月31日

针尖石墨烯纳米场效应晶体管生物传感器的研究

国家自然科学基金

0+阅读 · 2012年12月31日

铁基超导体角分辨光电子能谱及电子拉曼光谱的研究

国家自然科学基金

0+阅读 · 2012年12月31日

射频原子波导的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

乙型肝炎病毒与宿主细胞相互作用的三维组学分析

国家自然科学基金

0+阅读 · 2009年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员