消除批量依赖的同时匹配批量正常化的代理 - 规范化活化 (Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence) - 专知论文

会员服务 ·

0

规范化的 · 层规范化 · 批量规范化 · 规范化 · 层 ·

2021 年 11 月 3 日

Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence

翻译：消除批量依赖的同时匹配批量正常化的代理 - 规范化活化

Antoine Labatie,Dominic Masters,Zach Eaton-Rosen,Carlo Luschi

from arxiv, NeurIPS 2021 camera-ready

We investigate the reasons for the performance degradation incurred with batch-independent normalization. We find that the prototypical techniques of layer normalization and instance normalization both induce the appearance of failure modes in the neural network's pre-activations: (i) layer normalization induces a collapse towards channel-wise constant functions; (ii) instance normalization induces a lack of variability in instance statistics, symptomatic of an alteration of the expressivity. To alleviate failure mode (i) without aggravating failure mode (ii), we introduce the technique "Proxy Normalization" that normalizes post-activations using a proxy distribution. When combined with layer normalization or group normalization, this batch-independent normalization emulates batch normalization's behavior and consistently matches or exceeds its performance.

翻译：我们调查了分批独立化导致性能退化的原因。我们发现,原型的层级正常化和例例正常化技术都导致神经网络的预激活中出现故障模式:(一) 层正常化导致向频道常态功能的崩溃;(二) 例正常化导致在实例统计数据中缺乏差异性,表现为表达力的改变。为了缓解故障模式 (一) 在不加重故障模式 (二) 的情况下,我们引入了“质正常化”技术,利用代理分配实现活动后活动正常化。在与层正常化或群体正常化相结合的情况下,这种分批独立的正常化模仿了分批正常化的行为,并始终匹配或超过其性能。

0

相关内容

规范化的

【NUS-Xavier教授】注意力神经网络，79页ppt

【NUS-Xavier教授】注意力神经网络，79页ppt

专知会员服务

66+阅读 · 2021年11月25日

基于图的异常检测，94页ppt

专知会员服务

78+阅读 · 2021年9月27日

【KDD2021】图神经网络，NUS- Xavier Bresson教授

【KDD2021】图神经网络，NUS- Xavier Bresson教授

专知会员服务

67+阅读 · 2021年8月20日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【伯克利】再思考 Transformer中的Batch Normalization

【伯克利】再思考 Transformer中的Batch Normalization

专知会员服务

41+阅读 · 2020年3月21日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【总结】强化学习需要批归一化(Batch Norm)吗？

【总结】强化学习需要批归一化(Batch Norm)吗？

深度强化学习实验室

28+阅读 · 2020年10月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Conditional Batch Normalization 详解

Conditional Batch Normalization 详解

极市平台

4+阅读 · 2019年4月12日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

BAT机器学习面试题1000题（376~380题）

BAT机器学习面试题1000题（376~380题）

七月在线实验室

9+阅读 · 2018年8月27日

视觉机械臂 visual-pushing-grasping

视觉机械臂 visual-pushing-grasping

CreateAMind

3+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

TCL: an ANN-to-SNN Conversion with Trainable Clipping Layers

Arxiv

3+阅读 · 2020年8月11日

Gated Channel Transformation for Visual Recognition

Arxiv

4+阅读 · 2020年3月27日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

Not All Attention Is Needed: Gated Attention Network for Sequence Data

Arxiv

3+阅读 · 2019年12月1日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

A General and Adaptive Robust Loss Function

A General and Adaptive Robust Loss Function

Arxiv

8+阅读 · 2018年11月5日

Towards Understanding Regularization in Batch Normalization

Towards Understanding Regularization in Batch Normalization

Arxiv

4+阅读 · 2018年9月27日

Self-Attention Generative Adversarial Networks

Arxiv

8+阅读 · 2018年5月21日

The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks

Arxiv

4+阅读 · 2018年4月9日

Group Normalization

Arxiv

7+阅读 · 2018年3月22日

VIP会员

文章信息

相关主题

批量规范化

相关VIP内容

【NUS-Xavier教授】注意力神经网络，79页ppt

【NUS-Xavier教授】注意力神经网络，79页ppt

专知会员服务

66+阅读 · 2021年11月25日

基于图的异常检测，94页ppt

专知会员服务

78+阅读 · 2021年9月27日

【KDD2021】图神经网络，NUS- Xavier Bresson教授

【KDD2021】图神经网络，NUS- Xavier Bresson教授

专知会员服务

67+阅读 · 2021年8月20日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【伯克利】再思考 Transformer中的Batch Normalization

【伯克利】再思考 Transformer中的Batch Normalization

专知会员服务

41+阅读 · 2020年3月21日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

美陆军五大转型方向

一种Agent自主性风险评估框架 | 最新文献

实时无人机指令处理：一种面向无人机系统的大语言模型方法

基于动态知识图谱的人工智能代理自主研究周期 | 文献

相关资讯

【总结】强化学习需要批归一化(Batch Norm)吗？

【总结】强化学习需要批归一化(Batch Norm)吗？

深度强化学习实验室

28+阅读 · 2020年10月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Conditional Batch Normalization 详解

Conditional Batch Normalization 详解

极市平台

4+阅读 · 2019年4月12日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

BAT机器学习面试题1000题（376~380题）

BAT机器学习面试题1000题（376~380题）

七月在线实验室

9+阅读 · 2018年8月27日

视觉机械臂 visual-pushing-grasping

视觉机械臂 visual-pushing-grasping

CreateAMind

3+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

TCL: an ANN-to-SNN Conversion with Trainable Clipping Layers

Arxiv

3+阅读 · 2020年8月11日

Gated Channel Transformation for Visual Recognition

Arxiv

4+阅读 · 2020年3月27日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

Not All Attention Is Needed: Gated Attention Network for Sequence Data

Arxiv

3+阅读 · 2019年12月1日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

A General and Adaptive Robust Loss Function

A General and Adaptive Robust Loss Function

Arxiv

8+阅读 · 2018年11月5日

Towards Understanding Regularization in Batch Normalization

Towards Understanding Regularization in Batch Normalization

Arxiv

4+阅读 · 2018年9月27日

Self-Attention Generative Adversarial Networks

Arxiv

8+阅读 · 2018年5月21日

The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks

Arxiv

4+阅读 · 2018年4月9日

Group Normalization

Arxiv

7+阅读 · 2018年3月22日

微信扫码咨询专知VIP会员