RECLIP: 使用小尺寸图像进行训练的资源高效CLIP (RECLIP: Resource-efficient CLIP by Training with Small Images) - 专知论文

会员服务 ·

0

计算资源 · 监督 · 文本检索 · 预训练 · 基线 ·

2023 年 4 月 12 日

RECLIP: Resource-efficient CLIP by Training with Small Images

翻译：RECLIP: 使用小尺寸图像进行训练的资源高效CLIP

Runze Li,Dahun Kim,Bir Bhanu,Weicheng Kuo

We present RECLIP (Resource-efficient CLIP), a simple method that minimizes computational resource footprint for CLIP (Contrastive Language Image Pretraining). Inspired by the notion of coarse-to-fine in computer vision, we leverage small images to learn from large-scale language supervision efficiently, and finetune the model with high-resolution data in the end. Since the complexity of the vision transformer heavily depends on input image size, our approach significantly reduces the training resource requirements both in theory and in practice. Using the same batch size and training epoch, RECLIP achieves highly competitive zero-shot classification and image text retrieval accuracy with 6 to 8$\times$ less computational resources and 7 to 9$\times$ fewer FLOPs than the baseline. Compared to the state-of-the-art contrastive learning methods, RECLIP demonstrates 5 to 59$\times$ training resource savings while maintaining highly competitive zero-shot classification and retrieval performance. We hope this work will pave the path for the broader research community to explore language supervised pretraining in more resource-friendly settings.

翻译：我们提出了RECLIP（资源高效CLIP），这是一种简单的方法，可以最小化对CLIP（对比语言图像预训练）的计算资源要求。受计算机视觉中粗略到精细的概念启发，我们利用小图像以高效地从大规模语义监督中学习，并最终使用高分辨率数据微调模型。由于视觉变换器的复杂性严重依赖于输入图像的大小，因此我们的方法在理论和实践中都显着降低了训练资源需求。使用相同的批次大小和训练时期，RECLIP以6-8倍的计算资源和7-9倍的FLOPs比基线实现了极具竞争力的零次分类和图像文本检索准确性。与最先进的对比学习方法相比，RECLIP表现出5-59倍的训练资源节省，同时保持极具竞争力的零次分类和检索性能。我们希望这项工作将为更加资源友好的环境下探索语言监督预训练为更广泛的研究社区铺平道路。

0

相关内容

计算资源

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NeurIPS 2021 | 寻MixTraining: 一种全新的物体检测训练范式

NeurIPS 2021 | 寻MixTraining: 一种全新的物体检测训练范式

专知会员服务

12+阅读 · 2021年12月9日

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

专知会员服务

18+阅读 · 2020年3月14日

【Google AI】开源NoisyStudent：自监督图像分类

【Google AI】开源NoisyStudent：自监督图像分类

专知会员服务

55+阅读 · 2020年2月18日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

专知会员服务

28+阅读 · 2019年12月27日

【ICCV2019最佳论文官方代码】Official pytorch implementation of the paper: "SinGAN: Learning a Generative Model from a Single Natural Image"(从单一自然图像中学习的无条件生成模型) 附PDF论文

【ICCV2019最佳论文官方代码】Official pytorch implementation of the paper: "SinGAN: Learning a Generative Model from a Single Natural Image"(从单一自然图像中学习的无条件生成模型) 附PDF论文

专知会员服务

22+阅读 · 2019年11月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

实践教程｜PyTorch 并行训练极简 Demo

实践教程｜PyTorch 并行训练极简 Demo

极市平台

0+阅读 · 2022年11月12日

ECCV 2022 | 无需下游训练，Tip-Adapter大幅提升CLIP图像分类准确率

ECCV 2022 | 无需下游训练，Tip-Adapter大幅提升CLIP图像分类准确率

机器之心

4+阅读 · 2022年9月25日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

专知

26+阅读 · 2018年5月22日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

受体相互作用蛋白3（RIP3）促进I型干扰素分泌的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于非易失内存设备的数据读写性能优化方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

孤雌胚胎干细胞向成纤维细胞和表皮细胞的定向诱导及构建双层组织工程皮肤的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于众核处理器的高通量视频解码优化技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

Yb离子和Ce离子共掺以增强GaN:Er微纳米晶发光性能的研究

国家自然科学基金

0+阅读 · 2013年12月31日

稀土复合添加提高镍基高温合金抗氧化性能的机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于二维随机映射和一范数优化的有监督图像分类研究

国家自然科学基金

3+阅读 · 2011年12月31日

可实现荧光成像的显微光学断层成像仪器

国家自然科学基金

0+阅读 · 2011年12月31日

核仁蛋白Bmsl1在斑马鱼肝脏早期发育过程中的功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

宫颈癌干细胞的特异基因表达分析

国家自然科学基金

0+阅读 · 2009年12月31日

MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures

Arxiv

0+阅读 · 2023年5月30日

LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections

Arxiv

0+阅读 · 2023年5月29日

Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning

Arxiv

0+阅读 · 2023年5月29日

Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models

Arxiv

0+阅读 · 2023年5月29日

Three Towers: Flexible Contrastive Learning with Pretrained Image Models

Arxiv

0+阅读 · 2023年5月29日

SD-Conv: Towards the Parameter-Efficiency of Dynamic Convolution

Arxiv

0+阅读 · 2023年5月26日

Free Lunch: Robust Cross-Lingual Transfer via Model Checkpoint Averaging

Arxiv

0+阅读 · 2023年5月26日

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Arxiv

22+阅读 · 2023年5月3日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources

Arxiv

13+阅读 · 2019年11月14日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NeurIPS 2021 | 寻MixTraining: 一种全新的物体检测训练范式

NeurIPS 2021 | 寻MixTraining: 一种全新的物体检测训练范式

专知会员服务

12+阅读 · 2021年12月9日

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

专知会员服务

18+阅读 · 2020年3月14日

【Google AI】开源NoisyStudent：自监督图像分类

【Google AI】开源NoisyStudent：自监督图像分类

专知会员服务

55+阅读 · 2020年2月18日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

专知会员服务

28+阅读 · 2019年12月27日

【ICCV2019最佳论文官方代码】Official pytorch implementation of the paper: "SinGAN: Learning a Generative Model from a Single Natural Image"(从单一自然图像中学习的无条件生成模型) 附PDF论文

【ICCV2019最佳论文官方代码】Official pytorch implementation of the paper: "SinGAN: Learning a Generative Model from a Single Natural Image"(从单一自然图像中学习的无条件生成模型) 附PDF论文

专知会员服务

22+阅读 · 2019年11月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

实践教程｜PyTorch 并行训练极简 Demo

实践教程｜PyTorch 并行训练极简 Demo

极市平台

0+阅读 · 2022年11月12日

ECCV 2022 | 无需下游训练，Tip-Adapter大幅提升CLIP图像分类准确率

ECCV 2022 | 无需下游训练，Tip-Adapter大幅提升CLIP图像分类准确率

机器之心

4+阅读 · 2022年9月25日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

专知

26+阅读 · 2018年5月22日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

相关论文

MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures

Arxiv

0+阅读 · 2023年5月30日

LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections

Arxiv

0+阅读 · 2023年5月29日

Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning

Arxiv

0+阅读 · 2023年5月29日

Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models

Arxiv

0+阅读 · 2023年5月29日

Three Towers: Flexible Contrastive Learning with Pretrained Image Models

Arxiv

0+阅读 · 2023年5月29日

SD-Conv: Towards the Parameter-Efficiency of Dynamic Convolution

Arxiv

0+阅读 · 2023年5月26日

Free Lunch: Robust Cross-Lingual Transfer via Model Checkpoint Averaging

Arxiv

0+阅读 · 2023年5月26日

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Arxiv

22+阅读 · 2023年5月3日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources

Arxiv

13+阅读 · 2019年11月14日

相关基金

受体相互作用蛋白3（RIP3）促进I型干扰素分泌的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于非易失内存设备的数据读写性能优化方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

孤雌胚胎干细胞向成纤维细胞和表皮细胞的定向诱导及构建双层组织工程皮肤的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于众核处理器的高通量视频解码优化技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

Yb离子和Ce离子共掺以增强GaN:Er微纳米晶发光性能的研究

国家自然科学基金

0+阅读 · 2013年12月31日

稀土复合添加提高镍基高温合金抗氧化性能的机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于二维随机映射和一范数优化的有监督图像分类研究

国家自然科学基金

3+阅读 · 2011年12月31日

可实现荧光成像的显微光学断层成像仪器

国家自然科学基金

0+阅读 · 2011年12月31日

核仁蛋白Bmsl1在斑马鱼肝脏早期发育过程中的功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

宫颈癌干细胞的特异基因表达分析

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员