斯托切斯渐变后代的有效噪音 (The effective noise of Stochastic Gradient Descent) - 专知论文

会员服务 ·

0

SGD · 随机梯度下降 · 噪声 · 训练误差 · Neural Networks ·

2021 年 12 月 20 日

The effective noise of Stochastic Gradient Descent

翻译：斯托切斯渐变后代的有效噪音

Francesca Mignacco,Pierfrancesco Urbani

from arxiv, 7 pages + appendix, 5 figures

Stochastic Gradient Descent (SGD) is the workhorse algorithm of deep learning technology. At each step of the training phase, a mini batch of samples is drawn from the training dataset and the weights of the neural network are adjusted according to the performance on this specific subset of examples. The mini-batch sampling procedure introduces a stochastic dynamics to the gradient descent, with a non-trivial state-dependent noise. We characterize the stochasticity of SGD and a recently-introduced variant, persistent SGD, in a prototypical neural network model. In the under-parametrized regime, where the final training error is positive, the SGD dynamics reaches a stationary state and we define an effective temperature from the fluctuation-dissipation theorem, computed from dynamical mean-field theory. We use the effective temperature to quantify the magnitude of the SGD noise as a function of the problem parameters. In the over-parametrized regime, where the training error vanishes, we measure the noise magnitude of SGD by computing the average distance between two replicas of the system with the same initialization and two different realizations of SGD noise. We find that the two noise measures behave similarly as a function of the problem parameters. Moreover, we observe that noisier algorithms lead to wider decision boundaries of the corresponding constraint satisfaction problem.

翻译：深层学习技术的工作马算法(SGD) 是深层学习技术的演算法。在培训阶段的每个阶段,从培训数据集中抽取少量样本,神经网络的重量根据这个特定子实例的性能进行调整。小型批量抽样程序将一个随机动态引入梯度的下降, 具有非三角状态的噪音。我们将SGD的随机性和最近引入的变异性( 持久性 SGD) 描述成一个模拟神经网络模型。在对称制度下, 最终培训错误为阳性的, SGD 动态达到一个固定状态, 我们从动态平均场理论中算出一个有效的温度。我们用有效温度来量化SGD噪音的大小, 作为问题参数的函数。在过度平衡的制度中, 我们测量SGD的噪声幅度, 通过计算两个参数之间的平均距离。我们发现两个相较宽的变异的变异的变异的系统, 也发现了一个相同的变异的变异的级算法。

0

相关内容

SGD

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICML2019机器学习顶会接受论文列表！

ICML2019机器学习顶会接受论文列表！

专知

10+阅读 · 2019年5月12日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

On the influence of roundoff errors on the convergence of the gradient descent method with low-precision floating-point computation

Arxiv

0+阅读 · 2022年2月24日

Mirror Descent Strikes Again: Optimal Stochastic Convex Optimization under Infinite Noise Variance

Mirror Descent Strikes Again: Optimal Stochastic Convex Optimization under Infinite Noise Variance

Arxiv

0+阅读 · 2022年2月23日

Extension of Dynamic Mode Decomposition for dynamic systems with incomplete information based on t-model of optimal prediction

Arxiv

0+阅读 · 2022年2月23日

Contrastive-mixup learning for improved speaker verification

Arxiv

0+阅读 · 2022年2月22日

Convergence of a Grassmannian Gradient Descent Algorithm for Subspace Estimation From Undersampled Data

Arxiv

0+阅读 · 2022年2月21日

MSTGD:A Memory Stochastic sTratified Gradient Descent Method with an Exponential Convergence Rate

Arxiv

0+阅读 · 2022年2月21日

Escape saddle points by a simple gradient-descent based algorithm

Arxiv

4+阅读 · 2021年11月28日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

VIP会员

文章信息

相关主题

随机梯度下降

Neural Networks

相关VIP内容

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICML2019机器学习顶会接受论文列表！

ICML2019机器学习顶会接受论文列表！

专知

10+阅读 · 2019年5月12日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

On the influence of roundoff errors on the convergence of the gradient descent method with low-precision floating-point computation

Arxiv

0+阅读 · 2022年2月24日

Mirror Descent Strikes Again: Optimal Stochastic Convex Optimization under Infinite Noise Variance

Mirror Descent Strikes Again: Optimal Stochastic Convex Optimization under Infinite Noise Variance

Arxiv

0+阅读 · 2022年2月23日

Extension of Dynamic Mode Decomposition for dynamic systems with incomplete information based on t-model of optimal prediction

Arxiv

0+阅读 · 2022年2月23日

Contrastive-mixup learning for improved speaker verification

Arxiv

0+阅读 · 2022年2月22日

Convergence of a Grassmannian Gradient Descent Algorithm for Subspace Estimation From Undersampled Data

Arxiv

0+阅读 · 2022年2月21日

MSTGD:A Memory Stochastic sTratified Gradient Descent Method with an Exponential Convergence Rate

Arxiv

0+阅读 · 2022年2月21日

Escape saddle points by a simple gradient-descent based algorithm

Arxiv

4+阅读 · 2021年11月28日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

微信扫码咨询专知VIP会员