指定信号传播的特性,以缩小未规范的ResNet的性能差距 (Characterizing signal propagation to close the performance gap in unnormalized ResNets) - 专知论文

会员服务 ·

0

Performer · ResNet · TOOLS · state-of-the-art · 规范化的 ·

2021 年 1 月 27 日

Characterizing signal propagation to close the performance gap in unnormalized ResNets

翻译：指定信号传播的特性,以缩小未规范的ResNet的性能差距

Andrew Brock,Soham De,Samuel L. Smith

from arxiv, Published as a conference paper at ICLR 2021

Batch Normalization is a key component in almost all state-of-the-art image classifiers, but it also introduces practical challenges: it breaks the independence between training examples within a batch, can incur compute and memory overhead, and often results in unexpected bugs. Building on recent theoretical analyses of deep ResNets at initialization, we propose a simple set of analysis tools to characterize signal propagation on the forward pass, and leverage these tools to design highly performant ResNets without activation normalization layers. Crucial to our success is an adapted version of the recently proposed Weight Standardization. Our analysis tools show how this technique preserves the signal in networks with ReLU or Swish activation functions by ensuring that the per-channel activation means do not grow with depth. Across a range of FLOP budgets, our networks attain performance competitive with the state-of-the-art EfficientNets on ImageNet.

翻译：批量正常化是几乎所有最先进的图像分类中的一个关键组成部分,但它也带来了实际挑战:它打破了一个批量内培训实例的独立性,可能引起计算和记忆管理费用,并常常导致出乎意料的错误。根据对初始化时深ResNets的理论分析,我们提出了一套简单的分析工具来描述远端通道上的信号传播特征,并利用这些工具来设计高性能ResNet而不激活正常化层。我们成功的关键在于对最近提议的 Weight 标准化进行了调整。我们的分析工具表明,这种技术如何通过确保单声道启动手段不会在深度上增长,从而将信号保存在ReLU或Swish激活功能的网络中。在FLOP的一系列预算中,我们的网络在与图像网络上最先进的节能网络竞争性能。

0

相关内容

Performer

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

57+阅读 · 2020年11月21日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【2020新书】Python Pro专业实践原则，Practices of the Python Pro，250页pdf

【2020新书】Python Pro专业实践原则，Practices of the Python Pro，250页pdf

专知会员服务

153+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

计算机视觉最佳实践、代码示例和相关文档

计算机视觉最佳实践、代码示例和相关文档

专知会员服务

20+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence

Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence

Arxiv

0+阅读 · 2021年3月22日

Delving into Variance Transmission and Normalization: Shift of Average Gradient Makes the Network Collapse

Arxiv

0+阅读 · 2021年3月22日

Efficient Construction of Nonlinear Models over Normalized Data

Efficient Construction of Nonlinear Models over Normalized Data

Arxiv

0+阅读 · 2021年3月19日

FATNN: Fast and Accurate Ternary Neural Networks

Arxiv

0+阅读 · 2021年3月19日

Stable ResNet

Arxiv

0+阅读 · 2021年3月18日

High-Performance Large-Scale Image Recognition Without Normalization

Arxiv

5+阅读 · 2021年2月11日

Gated Channel Transformation for Visual Recognition

Arxiv

4+阅读 · 2020年3月27日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

Efficient and Effective $L_0$ Feature Selection

Efficient and Effective $L_0$ Feature Selection

Arxiv

5+阅读 · 2018年8月7日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

57+阅读 · 2020年11月21日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【2020新书】Python Pro专业实践原则，Practices of the Python Pro，250页pdf

【2020新书】Python Pro专业实践原则，Practices of the Python Pro，250页pdf

专知会员服务

153+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

计算机视觉最佳实践、代码示例和相关文档

计算机视觉最佳实践、代码示例和相关文档

专知会员服务

20+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence

Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence

Arxiv

0+阅读 · 2021年3月22日

Delving into Variance Transmission and Normalization: Shift of Average Gradient Makes the Network Collapse

Arxiv

0+阅读 · 2021年3月22日

Efficient Construction of Nonlinear Models over Normalized Data

Efficient Construction of Nonlinear Models over Normalized Data

Arxiv

0+阅读 · 2021年3月19日

FATNN: Fast and Accurate Ternary Neural Networks

Arxiv

0+阅读 · 2021年3月19日

Stable ResNet

Arxiv

0+阅读 · 2021年3月18日

High-Performance Large-Scale Image Recognition Without Normalization

Arxiv

5+阅读 · 2021年2月11日

Gated Channel Transformation for Visual Recognition

Arxiv

4+阅读 · 2020年3月27日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

Efficient and Effective $L_0$ Feature Selection

Efficient and Effective $L_0$ Feature Selection

Arxiv

5+阅读 · 2018年8月7日

微信扫码咨询专知VIP会员