Maximal Initial Learning Rates in Deep ReLU Networks - 专知论文

会员服务 ·

0

学习率 · Networking · Learning · ReLU · 宽度 ·

2023 年 5 月 26 日

Maximal Initial Learning Rates in Deep ReLU Networks

翻译：暂无翻译

Gaurav Iyer,Boris Hanin,David Rolnick

from arxiv, International Conference on Machine Learning (ICML) 2023

Training a neural network requires choosing a suitable learning rate, which involves a trade-off between speed and effectiveness of convergence. While there has been considerable theoretical and empirical analysis of how large the learning rate can be, most prior work focuses only on late-stage training. In this work, we introduce the maximal initial learning rate $\eta^{\ast}$ - the largest learning rate at which a randomly initialized neural network can successfully begin training and achieve (at least) a given threshold accuracy. Using a simple approach to estimate $\eta^{\ast}$, we observe that in constant-width fully-connected ReLU networks, $\eta^{\ast}$ behaves differently from the maximum learning rate later in training. Specifically, we find that $\eta^{\ast}$ is well predicted as a power of depth $\times$ width, provided that (i) the width of the network is sufficiently large compared to the depth, and (ii) the input layer is trained at a relatively small learning rate. We further analyze the relationship between $\eta^{\ast}$ and the sharpness $\lambda_{1}$ of the network at initialization, indicating they are closely though not inversely related. We formally prove bounds for $\lambda_{1}$ in terms of depth $\times$ width that align with our empirical results.

翻译：暂无翻译

0

相关内容

学习率

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

中国罗非鱼源无乳链球菌溯源及跨宿主感染分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于β-葡聚糖受体Dectin-1的黑灵芝多糖免疫调节作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

肝细胞癌中Hedgehog/Gli通路对EMT的调控及其在癌转移中的作用

国家自然科学基金

0+阅读 · 2010年12月31日

类胰蛋白酶在ApoE-/-小鼠动脉粥样硬化斑块内出血中的作用及机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks

Arxiv

0+阅读 · 2023年7月13日

Robust online active learning

Arxiv

0+阅读 · 2023年7月13日

A Bayesian Take on Gaussian Process Networks

Arxiv

0+阅读 · 2023年7月12日

A neuron-wise subspace correction method for the finite neuron method

Arxiv

0+阅读 · 2023年7月11日

A Survey on Bayesian Deep Learning

A Survey on Bayesian Deep Learning

Arxiv

64+阅读 · 2020年7月2日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型中的事件抽取：方法、模态与未来展望的全面综述

美海军作战管理系统：变革战场空间的二十年

【MIT博士论文】以语言为中心的医学影像理解

俄罗斯“沙希德”/“天竺葵”攻击无人机

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks

Arxiv

0+阅读 · 2023年7月13日

Robust online active learning

Arxiv

0+阅读 · 2023年7月13日

A Bayesian Take on Gaussian Process Networks

Arxiv

0+阅读 · 2023年7月12日

A neuron-wise subspace correction method for the finite neuron method

Arxiv

0+阅读 · 2023年7月11日

A Survey on Bayesian Deep Learning

A Survey on Bayesian Deep Learning

Arxiv

64+阅读 · 2020年7月2日

相关基金

中国罗非鱼源无乳链球菌溯源及跨宿主感染分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于β-葡聚糖受体Dectin-1的黑灵芝多糖免疫调节作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

肝细胞癌中Hedgehog/Gli通路对EMT的调控及其在癌转移中的作用

国家自然科学基金

0+阅读 · 2010年12月31日

类胰蛋白酶在ApoE-/-小鼠动脉粥样硬化斑块内出血中的作用及机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员