深神经网络培训中不出现随机梯度梯度下降的趋同现象 (Non-convergence of stochastic gradient descent in the training of deep neural networks) - 专知论文

会员服务 ·

0

随机梯度下降 · Networking · Neural Networks · 近似误差 · 离散化 ·

2021 年 1 月 29 日

Non-convergence of stochastic gradient descent in the training of deep neural networks

翻译：深神经网络培训中不出现随机梯度梯度下降的趋同现象

Patrick Cheridito,Arnulf Jentzen,Florian Rossmannek

Deep neural networks have successfully been trained in various application areas with stochastic gradient descent. However, there exists no rigorous mathematical explanation why this works so well. The training of neural networks with stochastic gradient descent has four different discretization parameters: (i) the network architecture; (ii) the amount of training data; (iii) the number of gradient steps; and (iv) the number of randomly initialized gradient trajectories. While it can be shown that the approximation error converges to zero if all four parameters are sent to infinity in the right order, we demonstrate in this paper that stochastic gradient descent fails to converge for ReLU networks if their depth is much larger than their width and the number of random initializations does not increase to infinity fast enough.

翻译：深神经网络在具有随机梯度梯度下降的各种应用领域都成功地接受了培训,然而,没有严格的数学解释为什么这一方法如此成功。对具有随机梯度梯度下降的神经网络的培训有四个不同的分化参数:(一) 网络结构;(二) 培训数据的数量;(三) 梯度步骤的数量;(四) 随机初始化梯度轨迹的数量。虽然可以证明,如果所有四个参数都按照正确的顺序被送至无限度,近似误差会达到零,但我们在本文中表明,如果ReLU网络的深度远大于其宽度,随机初始化的数量不会增加至不精确度足够快,则神经梯度梯度下降无法为RELU网络聚合。

0

相关内容

随机梯度下降

随机梯度下降

随机梯度下降，按照数据生成分布抽取m个样本，通过计算他们梯度的平均值来更新梯度。

【斯坦福大学课程】2021年深度多任务学习与元学习，CS 330: Deep Multi-Task and Meta Learning

【斯坦福大学课程】2021年深度多任务学习与元学习，CS 330: Deep Multi-Task and Meta Learning

专知会员服务

110+阅读 · 2022年3月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

57+阅读 · 2020年11月21日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日

【MIT】时间序列GAN，Subadditivity of Probability Divergences

专知会员服务

63+阅读 · 2020年3月4日

【新书】MATLAB深度学习与机器学习、神经网络和人工智能（MATLAB Deep Learning With Machine Learning, Neural Networks and Artificial Intelligence），162页pdf，

【新书】MATLAB深度学习与机器学习、神经网络和人工智能（MATLAB Deep Learning With Machine Learning, Neural Networks and Artificial Intelligence），162页pdf，

专知会员服务

92+阅读 · 2020年1月13日

《动手学深度学习》(Dive into Deep Learning)PyTorch实现

《动手学深度学习》(Dive into Deep Learning)PyTorch实现

专知会员服务

120+阅读 · 2019年12月31日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

车辆目标检测

车辆目标检测

数据挖掘入门与实战

30+阅读 · 2018年3月30日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Convergence Rate Analysis for Deep Ritz Method

Arxiv

0+阅读 · 2021年3月24日

HYDRA: Hypergradient Data Relevance Analysis for Interpreting Deep Neural Networks

Arxiv

0+阅读 · 2021年3月24日

Taming neural networks with TUSLA: Non-convex learning via adaptive stochastic gradient Langevin algorithms

Arxiv

0+阅读 · 2021年3月23日

Stochastic Reweighted Gradient Descent

Arxiv

0+阅读 · 2021年3月23日

Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence

Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence

Arxiv

0+阅读 · 2021年3月22日

Convergence of stochastic gradient descent schemes for Lojasiewicz-landscapes

Arxiv

0+阅读 · 2021年3月22日

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

Arxiv

15+阅读 · 2020年7月1日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

VIP会员

文章信息

相关主题

随机梯度下降

Neural Networks

相关VIP内容

【斯坦福大学课程】2021年深度多任务学习与元学习，CS 330: Deep Multi-Task and Meta Learning

【斯坦福大学课程】2021年深度多任务学习与元学习，CS 330: Deep Multi-Task and Meta Learning

专知会员服务

110+阅读 · 2022年3月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

57+阅读 · 2020年11月21日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日

【MIT】时间序列GAN，Subadditivity of Probability Divergences

专知会员服务

63+阅读 · 2020年3月4日

【新书】MATLAB深度学习与机器学习、神经网络和人工智能（MATLAB Deep Learning With Machine Learning, Neural Networks and Artificial Intelligence），162页pdf，

【新书】MATLAB深度学习与机器学习、神经网络和人工智能（MATLAB Deep Learning With Machine Learning, Neural Networks and Artificial Intelligence），162页pdf，

专知会员服务

92+阅读 · 2020年1月13日

《动手学深度学习》(Dive into Deep Learning)PyTorch实现

《动手学深度学习》(Dive into Deep Learning)PyTorch实现

专知会员服务

120+阅读 · 2019年12月31日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

车辆目标检测

车辆目标检测

数据挖掘入门与实战

30+阅读 · 2018年3月30日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Convergence Rate Analysis for Deep Ritz Method

Arxiv

0+阅读 · 2021年3月24日

HYDRA: Hypergradient Data Relevance Analysis for Interpreting Deep Neural Networks

Arxiv

0+阅读 · 2021年3月24日

Taming neural networks with TUSLA: Non-convex learning via adaptive stochastic gradient Langevin algorithms

Arxiv

0+阅读 · 2021年3月23日

Stochastic Reweighted Gradient Descent

Arxiv

0+阅读 · 2021年3月23日

Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence

Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence

Arxiv

0+阅读 · 2021年3月22日

Convergence of stochastic gradient descent schemes for Lojasiewicz-landscapes

Arxiv

0+阅读 · 2021年3月22日

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

Arxiv

15+阅读 · 2020年7月1日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

微信扫码咨询专知VIP会员