稳定恢复网 (Stable ResNet) - 专知论文

会员服务 ·

0

ResNet · 梯度爆炸 · Performer · 梯度消失 · 类别 ·

2021 年 3 月 18 日

翻译：稳定恢复网

Soufiane Hayou,Eugenio Clerico,Bobby He,George Deligiannidis,Arnaud Doucet,Judith Rousseau

from arxiv, 43 pages, 4 figures

Deep ResNet architectures have achieved state of the art performance on many tasks. While they solve the problem of gradient vanishing, they might suffer from gradient exploding as the depth becomes large (Yang et al. 2017). Moreover, recent results have shown that ResNet might lose expressivity as the depth goes to infinity (Yang et al. 2017, Hayou et al. 2019). To resolve these issues, we introduce a new class of ResNet architectures, called Stable ResNet, that have the property of stabilizing the gradient while ensuring expressivity in the infinite depth limit.

翻译：深 ResNet 架构在许多任务中取得了最新表现。虽然它们解决了梯度消失的问题, 但随着深度的扩大,它们可能会受到梯度爆炸的影响(Yang等人,2017年)。此外,最近的结果显示,随着深度的扩大,ResNet可能会失去表达性(Yang等人,2017年,Hayou等人,2019年)。为了解决这些问题,我们引入了一种新的 ResNet 架构类别,称为Stair ResNet, 其属性是稳定梯度,同时确保无限深度限制的表达性。

0

相关内容

ResNet

【NeurIPS2020-北大】非凸优化裁剪算法的改进分析

【NeurIPS2020-北大】非凸优化裁剪算法的改进分析

专知会员服务

29+阅读 · 2020年10月11日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【CVPR 2020-商汤】8比特数值也能训练卷积神经网络模型

【CVPR 2020-商汤】8比特数值也能训练卷积神经网络模型

专知会员服务

26+阅读 · 2020年5月7日

【伯克利】再思考 Transformer中的Batch Normalization

【伯克利】再思考 Transformer中的Batch Normalization

专知会员服务

41+阅读 · 2020年3月21日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

对ResNet本质的一些思考

对ResNet本质的一些思考

极市平台

26+阅读 · 2019年4月27日

对 ResNet 本质的一些思考

对 ResNet 本质的一些思考

新智元

6+阅读 · 2019年4月12日

如何训练你的ResNet（三）：正则化

如何训练你的ResNet（三）：正则化

论智

5+阅读 · 2018年11月13日

手把手教你构建ResNet残差网络

手把手教你构建ResNet残差网络

专知

38+阅读 · 2018年4月27日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

Momentum Residual Neural Networks

Arxiv

7+阅读 · 2021年5月13日

HeunNet: Extending ResNet using Heun's Methods

Arxiv

0+阅读 · 2021年5月13日

Optimal Oracles for Point-to-Set Principles

Arxiv

0+阅读 · 2021年5月12日

Targeting Makes Sample Efficiency in Auction Design

Targeting Makes Sample Efficiency in Auction Design

Arxiv

0+阅读 · 2021年5月11日

A general method to introduce order-preserving mapping for improving the mapped WENO schemes

Arxiv

0+阅读 · 2021年5月11日

Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics

Arxiv

0+阅读 · 2021年5月8日

Uniform Convergence, Adversarial Spheres and a Simple Remedy

Arxiv

0+阅读 · 2021年5月7日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Universal Transformers

Universal Transformers

Arxiv

5+阅读 · 2019年3月5日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

VIP会员

文章信息

相关主题

相关VIP内容

【NeurIPS2020-北大】非凸优化裁剪算法的改进分析

【NeurIPS2020-北大】非凸优化裁剪算法的改进分析

专知会员服务

29+阅读 · 2020年10月11日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【CVPR 2020-商汤】8比特数值也能训练卷积神经网络模型

【CVPR 2020-商汤】8比特数值也能训练卷积神经网络模型

专知会员服务

26+阅读 · 2020年5月7日

【伯克利】再思考 Transformer中的Batch Normalization

【伯克利】再思考 Transformer中的Batch Normalization

专知会员服务

41+阅读 · 2020年3月21日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《使用量化测量将传感器节点关联到融合中心的算法设计》171页

军事前沿模型

提升军事训练能力的最佳人工智能模拟工具

《社交媒体信息作战》最新48页技术报告

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

对ResNet本质的一些思考

对ResNet本质的一些思考

极市平台

26+阅读 · 2019年4月27日

对 ResNet 本质的一些思考

对 ResNet 本质的一些思考

新智元

6+阅读 · 2019年4月12日

如何训练你的ResNet（三）：正则化

如何训练你的ResNet（三）：正则化

论智

5+阅读 · 2018年11月13日

手把手教你构建ResNet残差网络

手把手教你构建ResNet残差网络

专知

38+阅读 · 2018年4月27日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

相关论文

Momentum Residual Neural Networks

Arxiv

7+阅读 · 2021年5月13日

HeunNet: Extending ResNet using Heun's Methods

Arxiv

0+阅读 · 2021年5月13日

Optimal Oracles for Point-to-Set Principles

Arxiv

0+阅读 · 2021年5月12日

Targeting Makes Sample Efficiency in Auction Design

Targeting Makes Sample Efficiency in Auction Design

Arxiv

0+阅读 · 2021年5月11日

A general method to introduce order-preserving mapping for improving the mapped WENO schemes

Arxiv

0+阅读 · 2021年5月11日

Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics

Arxiv

0+阅读 · 2021年5月8日

Uniform Convergence, Adversarial Spheres and a Simple Remedy

Arxiv

0+阅读 · 2021年5月7日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Universal Transformers

Universal Transformers

Arxiv

5+阅读 · 2019年3月5日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

微信扫码咨询专知VIP会员