指定信号传播的特性,以缩小未规范的ResNet的性能差距 (Characterizing signal propagation to close the performance gap in unnormalized ResNets) - 专知论文

会员服务 ·

0

Performer · ResNet · TOOLS · state-of-the-art · 规范化的 ·

2021 年 1 月 21 日

Characterizing signal propagation to close the performance gap in unnormalized ResNets

翻译：指定信号传播的特性,以缩小未规范的ResNet的性能差距

Andrew Brock,Soham De,Samuel L. Smith

from arxiv, Published as a conference paper at ICLR 2021

Batch Normalization is a key component in almost all state-of-the-art image classifiers, but it also introduces practical challenges: it breaks the independence between training examples within a batch, can incur compute and memory overhead, and often results in unexpected bugs. Building on recent theoretical analyses of deep ResNets at initialization, we propose a simple set of analysis tools to characterize signal propagation on the forward pass, and leverage these tools to design highly performant ResNets without activation normalization layers. Crucial to our success is an adapted version of the recently proposed Weight Standardization. Our analysis tools show how this technique preserves the signal in networks with ReLU or Swish activation functions by ensuring that the per-channel activation means do not grow with depth. Across a range of FLOP budgets, our networks attain performance competitive with the state-of-the-art EfficientNets on ImageNet.

翻译：批量正常化是几乎所有最先进的图像分类中的一个关键组成部分,但它也带来了实际挑战:它打破了一个批量内培训实例的独立性,可能引起计算和记忆管理费用,并常常导致出乎意料的错误。根据对初始化时深ResNets的理论分析,我们提出了一套简单的分析工具来描述远端通道上的信号传播特征,并利用这些工具来设计高性能ResNet而不激活正常化层。我们成功的关键在于对最近提议的 Weight 标准化进行了调整。我们的分析工具表明,这种技术如何通过确保单声道启动手段不会在深度上增长,从而将信号保存在ReLU或Swish激活功能的网络中。在FLOP的一系列预算中,我们的网络在与图像网络上最先进的节能网络竞争性能。

0

相关内容

Performer

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【伯克利】再思考 Transformer中的Batch Normalization

【伯克利】再思考 Transformer中的Batch Normalization

专知会员服务

41+阅读 · 2020年3月21日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

已删除

将门创投

4+阅读 · 2019年9月10日

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

An Investigation of the Weight Space to Monitor the Training Progress of Neural Networks

Arxiv

0+阅读 · 2021年3月17日

Characterizing Attitudinal Network Graphs through Frustration Cloud

Arxiv

0+阅读 · 2021年3月17日

Interpretable Deep Learning for the Remote Characterisation of Ambulation in Multiple Sclerosis using Smartphones

Arxiv

0+阅读 · 2021年3月16日

The Hierarchy of Block Models

Arxiv

0+阅读 · 2021年3月13日

PACO: Global Signal Restoration via PAtch COnsensus

Arxiv

0+阅读 · 2021年3月12日

High-Performance Large-Scale Image Recognition Without Normalization

Arxiv

5+阅读 · 2021年2月11日

On Layer Normalization in the Transformer Architecture

Arxiv

4+阅读 · 2020年2月12日

Adversarial Examples Improve Image Recognition

Arxiv

4+阅读 · 2019年11月21日

Deep Metric Transfer for Label Propagation with Limited Annotated Data

Arxiv

3+阅读 · 2018年12月20日

Near Real-time Hippocampus Segmentation Using Patch-based Canonical Neural Network

Near Real-time Hippocampus Segmentation Using Patch-based Canonical Neural Network

Arxiv

4+阅读 · 2018年7月15日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【伯克利】再思考 Transformer中的Batch Normalization

【伯克利】再思考 Transformer中的Batch Normalization

专知会员服务

41+阅读 · 2020年3月21日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

已删除

将门创投

4+阅读 · 2019年9月10日

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

An Investigation of the Weight Space to Monitor the Training Progress of Neural Networks

Arxiv

0+阅读 · 2021年3月17日

Characterizing Attitudinal Network Graphs through Frustration Cloud

Arxiv

0+阅读 · 2021年3月17日

Interpretable Deep Learning for the Remote Characterisation of Ambulation in Multiple Sclerosis using Smartphones

Arxiv

0+阅读 · 2021年3月16日

The Hierarchy of Block Models

Arxiv

0+阅读 · 2021年3月13日

PACO: Global Signal Restoration via PAtch COnsensus

Arxiv

0+阅读 · 2021年3月12日

High-Performance Large-Scale Image Recognition Without Normalization

Arxiv

5+阅读 · 2021年2月11日

On Layer Normalization in the Transformer Architecture

Arxiv

4+阅读 · 2020年2月12日

Adversarial Examples Improve Image Recognition

Arxiv

4+阅读 · 2019年11月21日

Deep Metric Transfer for Label Propagation with Limited Annotated Data

Arxiv

3+阅读 · 2018年12月20日

Near Real-time Hippocampus Segmentation Using Patch-based Canonical Neural Network

Near Real-time Hippocampus Segmentation Using Patch-based Canonical Neural Network

Arxiv

4+阅读 · 2018年7月15日

微信扫码咨询专知VIP会员