Optimal signal propagation in ResNets through residual scaling - 专知论文

会员服务 ·

0

Networking · 缩放 · ResNet · 优化器 · Performer ·

2023 年 5 月 12 日

Optimal signal propagation in ResNets through residual scaling

翻译：暂无翻译

Kirsten Fischer,David Dahmen,Moritz Helias

from arxiv, 19 pages, 6 figures, under review

Residual networks (ResNets) have significantly better trainability and thus performance than feed-forward networks at large depth. Introducing skip connections facilitates signal propagation to deeper layers. In addition, previous works found that adding a scaling parameter for the residual branch further improves generalization performance. While they empirically identified a particularly beneficial range of values for this scaling parameter, the associated performance improvement and its universality across network hyperparameters yet need to be understood. For feed-forward networks (FFNets), finite-size theories have led to important insights with regard to signal propagation and hyperparameter tuning. We here derive a systematic finite-size theory for ResNets to study signal propagation and its dependence on the scaling for the residual branch. We derive analytical expressions for the response function, a measure for the network's sensitivity to inputs, and show that for deep networks the empirically found values for the scaling parameter lie within the range of maximal sensitivity. Furthermore, we obtain an analytical expression for the optimal scaling parameter that depends only weakly on other network hyperparameters, such as the weight variance, thereby explaining its universality across hyperparameters. Overall, this work provides a framework for theory-guided optimal scaling in ResNets and, more generally, provides the theoretical framework to study ResNets at finite widths.

翻译：暂无翻译

0

相关内容

Networking

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

流域非点源污染景观源汇格局因子空间定量计算方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

有机溶剂中离子液体与环糊精超分子自组装研究

国家自然科学基金

0+阅读 · 2014年12月31日

β-环糊精协同过渡金属对黄曲霉毒素的超分子组装和识别研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

亲环素A在斜纹夜蛾细胞免疫抑制反应中的作用机理

国家自然科学基金

0+阅读 · 2012年12月31日

高压下Cd,In,Pb基半导体纳米晶结构、性质及荧光增强效应的研究

国家自然科学基金

0+阅读 · 2012年12月31日

城市绿地重金属与典型除草剂复合污染生态风险评估研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型巴拿赫空间及其上算子结构

国家自然科学基金

0+阅读 · 2011年12月31日

纳豆菌糖肽对巨噬细胞的双向免疫调节及NF-kB信号通路的调控研究

国家自然科学基金

0+阅读 · 2011年12月31日

氧化应激诱导的G2/M期阻滞中HSP90对26S蛋白酶体的调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

KDEformer: Accelerating Transformers via Kernel Density Estimation

Arxiv

0+阅读 · 2023年6月29日

A DeepONet multi-fidelity approach for residual learning in reduced order modeling

Arxiv

0+阅读 · 2023年6月28日

A Review on Optimality Investigation Strategies for the Balanced Assignment Problem

Arxiv

0+阅读 · 2023年6月28日

Understanding the Effect of the Long Tail on Neural Network Compression

Arxiv

0+阅读 · 2023年6月27日

Approximate Message Passing for the Matrix Tensor Product Model

Arxiv

0+阅读 · 2023年6月27日

PEPNet: Parameter and Embedding Personalized Network for Infusing with Personalized Prior Information

Arxiv

0+阅读 · 2023年6月27日

Optimal Differentially Private Learning with Public Data

Arxiv

0+阅读 · 2023年6月26日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

Adversarial Multimodal Representation Learning for Click-Through Rate Prediction

Arxiv

23+阅读 · 2020年3月7日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《面向未来部队设计的兵棋推演：解锁过程中的作战艺术》

《模拟空域：释放人工智能实现自适应空中防御》2025年最新文献

《迈向真正的机器人队友：推断与运用认知状态以实现新型人类-自主系统协作能力》最新博士论文

《面向开放式兵棋推演的语言模型》2025最新文献

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

KDEformer: Accelerating Transformers via Kernel Density Estimation

Arxiv

0+阅读 · 2023年6月29日

A DeepONet multi-fidelity approach for residual learning in reduced order modeling

Arxiv

0+阅读 · 2023年6月28日

A Review on Optimality Investigation Strategies for the Balanced Assignment Problem

Arxiv

0+阅读 · 2023年6月28日

Understanding the Effect of the Long Tail on Neural Network Compression

Arxiv

0+阅读 · 2023年6月27日

Approximate Message Passing for the Matrix Tensor Product Model

Arxiv

0+阅读 · 2023年6月27日

PEPNet: Parameter and Embedding Personalized Network for Infusing with Personalized Prior Information

Arxiv

0+阅读 · 2023年6月27日

Optimal Differentially Private Learning with Public Data

Arxiv

0+阅读 · 2023年6月26日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

Adversarial Multimodal Representation Learning for Click-Through Rate Prediction

Arxiv

23+阅读 · 2020年3月7日

相关基金

流域非点源污染景观源汇格局因子空间定量计算方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

有机溶剂中离子液体与环糊精超分子自组装研究

国家自然科学基金

0+阅读 · 2014年12月31日

β-环糊精协同过渡金属对黄曲霉毒素的超分子组装和识别研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

亲环素A在斜纹夜蛾细胞免疫抑制反应中的作用机理

国家自然科学基金

0+阅读 · 2012年12月31日

高压下Cd,In,Pb基半导体纳米晶结构、性质及荧光增强效应的研究

国家自然科学基金

0+阅读 · 2012年12月31日

城市绿地重金属与典型除草剂复合污染生态风险评估研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型巴拿赫空间及其上算子结构

国家自然科学基金

0+阅读 · 2011年12月31日

纳豆菌糖肽对巨噬细胞的双向免疫调节及NF-kB信号通路的调控研究

国家自然科学基金

0+阅读 · 2011年12月31日

氧化应激诱导的G2/M期阻滞中HSP90对26S蛋白酶体的调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员