托盘白白批量正常化 (Stochastic Whitening Batch Normalization) - 专知论文

会员服务 ·

0

白化 · 规范化的 · 批量规范化 · 泛化理论 · INFORMS ·

2021 年 6 月 3 日

Stochastic Whitening Batch Normalization

翻译：托盘白白批量正常化

Shengdong Zhang,Ehsan Nezhadarya,Homa Fashandi,Jiayi Liu,Darin Graham,Mohak Shah

from arxiv, Accepted to the Main Conference of CVPR 2021

Batch Normalization (BN) is a popular technique for training Deep Neural Networks (DNNs). BN uses scaling and shifting to normalize activations of mini-batches to accelerate convergence and improve generalization. The recently proposed Iterative Normalization (IterNorm) method improves these properties by whitening the activations iteratively using Newton's method. However, since Newton's method initializes the whitening matrix independently at each training step, no information is shared between consecutive steps. In this work, instead of exact computation of whitening matrix at each time step, we estimate it gradually during training in an online fashion, using our proposed Stochastic Whitening Batch Normalization (SWBN) algorithm. We show that while SWBN improves the convergence rate and generalization of DNNs, its computational overhead is less than that of IterNorm. Due to the high efficiency of the proposed method, it can be easily employed in most DNN architectures with a large number of layers. We provide comprehensive experiments and comparisons between BN, IterNorm, and SWBN layers to demonstrate the effectiveness of the proposed technique in conventional (many-shot) image classification and few-shot classification tasks.

翻译：普通化(BN)是培训深神经网络(DNN)的流行技术。BN使用缩放和转换,将微型插管的启动正常化,以加速趋同,改进一般化。最近提议的超常化(IterNorm)方法通过使用牛顿方法使激活的白化率(IterNorm)法更新这些属性。然而,由于牛顿的方法在每一培训步骤中独立初始化白化矩阵,因此在每一培训步骤中,连续步骤之间不能共享任何信息。在这项工作中,除了准确计算每个步骤的白化矩阵之外,我们使用我们提议的Stochastartic白批正常化(SWBNBN)算法(SWBN)算法(IterNorm)法(IterNorm),SWBNBN(SWB、ILERNORM和SWBNBS)级培训中,我们利用我们提议的SWBN、SAL-SAL-S、SL II 和SAL II 和S-SG II AL 和 AL AL 等等等和SAL 和S 等等等等等等等等等等等和等等等等等等等等等等、等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等等

0

相关内容

计算机理论顶会STOC 2021奖项出炉，滕尚华等华人学者获奖

专知会员服务

8+阅读 · 2021年7月22日

【FB陈鑫磊-何恺明】SimSiam：消除表征学习“崩溃解”，探寻对比表达学习成功之根源

【FB陈鑫磊-何恺明】SimSiam：消除表征学习“崩溃解”，探寻对比表达学习成功之根源

专知会员服务

20+阅读 · 2020年11月26日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

专知会员服务

92+阅读 · 2020年5月5日

为什么批处理规范会导致梯度爆炸，Why Batch Norm Causes Exploding Gradients

为什么批处理规范会导致梯度爆炸，Why Batch Norm Causes Exploding Gradients

专知会员服务

17+阅读 · 2020年4月2日

【伯克利】再思考 Transformer中的Batch Normalization

【伯克利】再思考 Transformer中的Batch Normalization

专知会员服务

41+阅读 · 2020年3月21日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

专知会员服务

4+阅读 · 2020年1月7日

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

专知会员服务

31+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec智能推荐

6+阅读 · 2019年3月7日

PyTorch中在反向传播前为什么要手动将梯度清零？

PyTorch中在反向传播前为什么要手动将梯度清零？

极市平台

39+阅读 · 2019年1月23日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【干货】Batch Normalization: 如何更快地训练深度神经网络

【干货】Batch Normalization: 如何更快地训练深度神经网络

专知

13+阅读 · 2018年3月6日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Whittle Index for A Class of Restless Bandits with Imperfect Observations

Arxiv

0+阅读 · 2021年8月9日

Nonperturbative renormalization for the neural network-QFT correspondence

Arxiv

0+阅读 · 2021年8月3日

Does Data Augmentation Benefit from Split BatchNorms

Does Data Augmentation Benefit from Split BatchNorms

Arxiv

3+阅读 · 2020年10月15日

Few-shot Neural Architecture Search

Arxiv

8+阅读 · 2020年6月15日

Resolution Adaptive Networks for Efficient Inference

Arxiv

5+阅读 · 2020年3月16日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Towards Understanding Regularization in Batch Normalization

Towards Understanding Regularization in Batch Normalization

Arxiv

4+阅读 · 2018年9月27日

Group Normalization

Arxiv

7+阅读 · 2018年3月22日

VIP会员

文章信息

相关主题

批量规范化

相关VIP内容

计算机理论顶会STOC 2021奖项出炉，滕尚华等华人学者获奖

专知会员服务

8+阅读 · 2021年7月22日

【FB陈鑫磊-何恺明】SimSiam：消除表征学习“崩溃解”，探寻对比表达学习成功之根源

【FB陈鑫磊-何恺明】SimSiam：消除表征学习“崩溃解”，探寻对比表达学习成功之根源

专知会员服务

20+阅读 · 2020年11月26日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

专知会员服务

92+阅读 · 2020年5月5日

为什么批处理规范会导致梯度爆炸，Why Batch Norm Causes Exploding Gradients

为什么批处理规范会导致梯度爆炸，Why Batch Norm Causes Exploding Gradients

专知会员服务

17+阅读 · 2020年4月2日

【伯克利】再思考 Transformer中的Batch Normalization

【伯克利】再思考 Transformer中的Batch Normalization

专知会员服务

41+阅读 · 2020年3月21日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

专知会员服务

4+阅读 · 2020年1月7日

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

专知会员服务

31+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec智能推荐

6+阅读 · 2019年3月7日

PyTorch中在反向传播前为什么要手动将梯度清零？

PyTorch中在反向传播前为什么要手动将梯度清零？

极市平台

39+阅读 · 2019年1月23日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【干货】Batch Normalization: 如何更快地训练深度神经网络

【干货】Batch Normalization: 如何更快地训练深度神经网络

专知

13+阅读 · 2018年3月6日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Whittle Index for A Class of Restless Bandits with Imperfect Observations

Arxiv

0+阅读 · 2021年8月9日

Nonperturbative renormalization for the neural network-QFT correspondence

Arxiv

0+阅读 · 2021年8月3日

Does Data Augmentation Benefit from Split BatchNorms

Does Data Augmentation Benefit from Split BatchNorms

Arxiv

3+阅读 · 2020年10月15日

Few-shot Neural Architecture Search

Arxiv

8+阅读 · 2020年6月15日

Resolution Adaptive Networks for Efficient Inference

Arxiv

5+阅读 · 2020年3月16日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Towards Understanding Regularization in Batch Normalization

Towards Understanding Regularization in Batch Normalization

Arxiv

4+阅读 · 2018年9月27日

Group Normalization

Arxiv

7+阅读 · 2018年3月22日

微信扫码咨询专知VIP会员