DisCo-CLIP：一种分布式对比损失，用于记忆高效的 CLIP 训练 (DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training) - 专知论文

会员服务 ·

0

GPU · 批量大小 · 梯度 · 损失 · 分解 ·

2023 年 4 月 17 日

DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training

翻译：DisCo-CLIP：一种分布式对比损失，用于记忆高效的 CLIP 训练

Yihao Chen,Xianbiao Qi,Jianan Wang,Lei Zhang

from arxiv, To appear in CVPR 2023 as a highlight, our code will be public at https://github.com/IDEA-Research/DisCo-CLIP

We propose DisCo-CLIP, a distributed memory-efficient CLIP training approach, to reduce the memory consumption of contrastive loss when training contrastive learning models. Our approach decomposes the contrastive loss and its gradient computation into two parts, one to calculate the intra-GPU gradients and the other to compute the inter-GPU gradients. According to our decomposition, only the intra-GPU gradients are computed on the current GPU, while the inter-GPU gradients are collected via all_reduce from other GPUs instead of being repeatedly computed on every GPU. In this way, we can reduce the GPU memory consumption of contrastive loss computation from $\bigO(B^2)$ to $\bigO(\frac{B^2}{N})$, where $B$ and $N$ are the batch size and the number of GPUs used for training. Such a distributed solution is mathematically equivalent to the original non-distributed contrastive loss computation, without sacrificing any computation accuracy. It is particularly efficient for large-batch CLIP training. For instance, DisCo-CLIP can enable contrastive training of a ViT-B/32 model with a batch size of 32K or 196K using 8 or 64 A100 40GB GPUs, compared with the original CLIP solution which requires 128 A100 40GB GPUs to train a ViT-B/32 model with a batch size of 32K. The code will be released at https://github.com/IDEA-Research/DisCo-CLIP

翻译：我们提出了 DisCo-CLIP，一种分布式记忆高效的 CLIP 训练方法，以减少对比学习模型训练中对比损失的内存消耗。我们的方法将对比损失及其梯度计算分解为两部分，一部分计算内部 GPU 梯度，另一部分计算跨 GPU 的梯度。根据我们的分解方法，仅在当前 GPU 上计算内部 GPU 梯度，而跨 GPU 的梯度则通过 all_reduce 从其他 GPU 中收集，而不是在每个 GPU 上重复计算。这样，我们可以将对比损失计算的 GPU 内存消耗从 $\bigO (B^2)$ 降低到 $\bigO (\frac {B^2}{N})$，其中 $B$ 和 $N$ 分别是批量大小和用于训练的 GPU 数量。这样的分布式解决方案在大批量 CLIP 训练中特别高效。例如，DisCo-CLIP 可以使用 8 或 64 个 A100 40GB GPU 对 ViT-B/32 模型进行对比训练，批量大小为 32K 或 196K，而原始 CLIP 解决方案则需要 128 个 A100 40GB GPU 来对批量大小为 32K 的 ViT-B/32 模型进行训练。代码将在 https://github.com/IDEA-Research/DisCo-CLIP 上发布。

0

相关内容

GPU

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【ICML2021】无训练神经架构搜索

专知会员服务

20+阅读 · 2021年9月16日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

近期必读的七篇NeurIPS 2020【对比学习】相关论文和代码

近期必读的七篇NeurIPS 2020【对比学习】相关论文和代码

专知会员服务

66+阅读 · 2020年10月20日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

专知会员服务

14+阅读 · 2020年5月19日

【百度】-大规模深度学习广告系统的分布式分层GPU参数服务器，Distributed Hierarchical GPU PS

专知会员服务

24+阅读 · 2020年3月15日

【AAAI2020论文-清华大学】Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources，最小资源增强的元学习跨语言命名实体识别

【AAAI2020论文-清华大学】Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources，最小资源增强的元学习跨语言命名实体识别

专知会员服务

31+阅读 · 2019年11月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

0+阅读 · 2022年6月10日

PyTorch Parallel Training（单机多卡并行、混合精度、同步BN训练指南文档）

PyTorch Parallel Training（单机多卡并行、混合精度、同步BN训练指南文档）

CVer

21+阅读 · 2020年6月20日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

小叶莲抗肿瘤多药耐药药效物质基础及其机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

箍筋约束ECC力学性能及应力-应变模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于GPU的directionlets域SAR图像相干斑噪声抑制并行算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

白光LED用硅基氮氧化物荧光材料的制备、性能及发光机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

用于非对称语料的语音转换函数训练算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

ASIC1a对NMDAR的门控作用和脑缺血中神经元的损伤作用及神经保护研究

国家自然科学基金

0+阅读 · 2011年12月31日

新型轻质高温γ1 +γ双相TiAl-Nb金属间化合物的强韧化机制

国家自然科学基金

0+阅读 · 2011年12月31日

槲皮素增敏阿霉素抗白血病作用及减轻其心肌毒性的实验研究

国家自然科学基金

0+阅读 · 2010年12月31日

Adiponectin在肝脏缺血再灌注损伤中的抗肝细胞凋亡机制

国家自然科学基金

0+阅读 · 2009年12月31日

肿瘤靶向的纳米硒颗粒的抑瘤机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

Scaling in Depth: Unlocking Robustness Certification on ImageNet

Arxiv

0+阅读 · 2023年6月2日

Enhancing CLIP with CLIP: Exploring Pseudolabeling for Limited-Label Prompt Tuning

Arxiv

0+阅读 · 2023年6月2日

Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training

Arxiv

0+阅读 · 2023年6月2日

Decentralized Online Regularized Learning Over Random Time-Varying Graphs

Arxiv

0+阅读 · 2023年6月2日

The Maximum Matrix Contraction Problem

Arxiv

0+阅读 · 2023年6月2日

Refined Regret for Adversarial MDPs with Linear Function Approximation

Arxiv

0+阅读 · 2023年6月1日

Scaling Expected Force: Efficient Identification of Key Nodes in Network-based Epidemic Models

Arxiv

0+阅读 · 2023年6月1日

Enhancing image quality prediction with self-supervised visual masking

Arxiv

0+阅读 · 2023年5月31日

Distributed Graph Neural Network Training: A Survey

Arxiv

16+阅读 · 2022年11月1日

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

Arxiv

16+阅读 · 2020年3月30日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【ICML2021】无训练神经架构搜索

专知会员服务

20+阅读 · 2021年9月16日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

近期必读的七篇NeurIPS 2020【对比学习】相关论文和代码

近期必读的七篇NeurIPS 2020【对比学习】相关论文和代码

专知会员服务

66+阅读 · 2020年10月20日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

专知会员服务

14+阅读 · 2020年5月19日

【百度】-大规模深度学习广告系统的分布式分层GPU参数服务器，Distributed Hierarchical GPU PS

专知会员服务

24+阅读 · 2020年3月15日

【AAAI2020论文-清华大学】Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources，最小资源增强的元学习跨语言命名实体识别

【AAAI2020论文-清华大学】Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources，最小资源增强的元学习跨语言命名实体识别

专知会员服务

31+阅读 · 2019年11月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

0+阅读 · 2022年6月10日

PyTorch Parallel Training（单机多卡并行、混合精度、同步BN训练指南文档）

PyTorch Parallel Training（单机多卡并行、混合精度、同步BN训练指南文档）

CVer

21+阅读 · 2020年6月20日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

Scaling in Depth: Unlocking Robustness Certification on ImageNet

Arxiv

0+阅读 · 2023年6月2日

Enhancing CLIP with CLIP: Exploring Pseudolabeling for Limited-Label Prompt Tuning

Arxiv

0+阅读 · 2023年6月2日

Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training

Arxiv

0+阅读 · 2023年6月2日

Decentralized Online Regularized Learning Over Random Time-Varying Graphs

Arxiv

0+阅读 · 2023年6月2日

The Maximum Matrix Contraction Problem

Arxiv

0+阅读 · 2023年6月2日

Refined Regret for Adversarial MDPs with Linear Function Approximation

Arxiv

0+阅读 · 2023年6月1日

Scaling Expected Force: Efficient Identification of Key Nodes in Network-based Epidemic Models

Arxiv

0+阅读 · 2023年6月1日

Enhancing image quality prediction with self-supervised visual masking

Arxiv

0+阅读 · 2023年5月31日

Distributed Graph Neural Network Training: A Survey

Arxiv

16+阅读 · 2022年11月1日

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

Arxiv

16+阅读 · 2020年3月30日

相关基金

小叶莲抗肿瘤多药耐药药效物质基础及其机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

箍筋约束ECC力学性能及应力-应变模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于GPU的directionlets域SAR图像相干斑噪声抑制并行算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

白光LED用硅基氮氧化物荧光材料的制备、性能及发光机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

用于非对称语料的语音转换函数训练算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

ASIC1a对NMDAR的门控作用和脑缺血中神经元的损伤作用及神经保护研究

国家自然科学基金

0+阅读 · 2011年12月31日

新型轻质高温γ1 +γ双相TiAl-Nb金属间化合物的强韧化机制

国家自然科学基金

0+阅读 · 2011年12月31日

槲皮素增敏阿霉素抗白血病作用及减轻其心肌毒性的实验研究

国家自然科学基金

0+阅读 · 2010年12月31日

Adiponectin在肝脏缺血再灌注损伤中的抗肝细胞凋亡机制

国家自然科学基金

0+阅读 · 2009年12月31日

肿瘤靶向的纳米硒颗粒的抑瘤机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员