具有几乎常态峰峰内内存使用功能的深对比学习批量大小 (Scaling Deep Contrastive Learning Batch Size with Almost Constant Peak Memory Usage) - 专知论文

会员服务 ·

0

contrastive · 对比学习 · 学成 · Batch Size · 样例 ·

2021 年 1 月 18 日

Scaling Deep Contrastive Learning Batch Size with Almost Constant Peak Memory Usage

翻译：具有几乎常态峰峰内内存使用功能的深对比学习批量大小

Luyu Gao,Yunyi Zhang

Contrastive learning has been applied successfully to learn numerical vector representations of various forms of data, such as texts and images. Learned encoders exhibit versatile transfer capabilities to many downstream tasks. Representation based search is highly efficient with state-of-the-art performance. Previous researches demonstrated that learning high-quality representations requires a large number of negatives in contrastive loss. In practice, the technique of in-batch negative is used, where for each example in a batch, other batch examples' positives will be taken as its negatives, avoiding encoding extra negatives. This, however, still conditions each example's loss on all batch examples and requires fitting the entire large batch into GPU memory. This paper introduces a re-computation technique that decouples back propagation between contrastive loss and the encoder, removing encoder backward pass data dependency along the batch dimension. As a result, gradients can be computed for one subset of the batch at a time, leading to an almost constant peak GPU memory usage for batches of different sizes.

翻译：成功应用了对比性学习来学习多种数据形式(如文本和图像)的数字矢量表达方式,例如文本和图像。进化编码器显示许多下游任务具有多种传导能力。基于代表的搜索效率很高,与最先进的性能相当。以前的研究表明,学习高质量表达方式需要大量的负值,而对比性损失则需要大量的负值。在实践中,使用批量负值技术, 对于每批中的每一例来说,其他批次示例的正值将被当作其负值, 避免编码额外的负值。然而, 这仍然使每个例的流失在所有批次实例中都成为条件, 需要将整批大批次都安装到 GPU 内存中。本文引入了一种重新计算技术, 将对比性损失和编码器之间的扩散分批量消除编码落后数据依赖性。结果, 一次可以计算出每批次组组次的梯度的梯度, 导致不同尺寸的批次的GPU记忆使用几乎稳定的峰值。

0

相关内容

contrastive

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

为什么批处理规范会导致梯度爆炸，Why Batch Norm Causes Exploding Gradients

为什么批处理规范会导致梯度爆炸，Why Batch Norm Causes Exploding Gradients

专知会员服务

17+阅读 · 2020年4月2日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【ICCV 2019】基于元学习的自动化神经网络通道 MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning

【ICCV 2019】基于元学习的自动化神经网络通道 MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning

专知会员服务

17+阅读 · 2019年11月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

4+阅读 · 2018年6月4日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Training Networks in Null Space of Covariance for Continual Learning

Arxiv

1+阅读 · 2021年3月12日

Supervised Contrastive Learning

Arxiv

0+阅读 · 2021年3月10日

Contrastive Learning with Adversarial Perturbations for Conditional Text Generation

Arxiv

0+阅读 · 2021年3月10日

Towards Strengthening Deep Learning-based Side Channel Attacks with Mixup

Arxiv

0+阅读 · 2021年3月10日

Debiased Contrastive Learning

Arxiv

5+阅读 · 2020年10月21日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

Efficient, Lexicon-Free OCR using Deep Learning

Arxiv

3+阅读 · 2019年6月5日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Learning to Adapt: Meta-Learning for Model-Based Control

Arxiv

9+阅读 · 2018年3月30日

SEARNN: Training RNNs with Global-Local Losses

Arxiv

5+阅读 · 2018年1月29日

VIP会员

文章信息

相关主题

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

为什么批处理规范会导致梯度爆炸，Why Batch Norm Causes Exploding Gradients

为什么批处理规范会导致梯度爆炸，Why Batch Norm Causes Exploding Gradients

专知会员服务

17+阅读 · 2020年4月2日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【ICCV 2019】基于元学习的自动化神经网络通道 MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning

【ICCV 2019】基于元学习的自动化神经网络通道 MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning

专知会员服务

17+阅读 · 2019年11月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

面向具身智能的多模态数据存储与检索：综述

《算法战争研究计划全景评估》35页

【CMU博士论文】水下三维视觉感知与生成

智能体战争：自主人工智能军备竞赛全景透视

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

4+阅读 · 2018年6月4日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Training Networks in Null Space of Covariance for Continual Learning

Arxiv

1+阅读 · 2021年3月12日

Supervised Contrastive Learning

Arxiv

0+阅读 · 2021年3月10日

Contrastive Learning with Adversarial Perturbations for Conditional Text Generation

Arxiv

0+阅读 · 2021年3月10日

Towards Strengthening Deep Learning-based Side Channel Attacks with Mixup

Arxiv

0+阅读 · 2021年3月10日

Debiased Contrastive Learning

Arxiv

5+阅读 · 2020年10月21日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

Efficient, Lexicon-Free OCR using Deep Learning

Arxiv

3+阅读 · 2019年6月5日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Learning to Adapt: Meta-Learning for Model-Based Control

Arxiv

9+阅读 · 2018年3月30日

SEARNN: Training RNNs with Global-Local Losses

Arxiv

5+阅读 · 2018年1月29日

微信扫码咨询专知VIP会员