重新审视小渐变噪音和动态的特征 (Revisiting the Characteristics of Stochastic Gradient Noise and Dynamics) - 专知论文

会员服务 ·

0

噪声 · 随机梯度下降 · 近似 · 超参数 · 学习率 ·

2021 年 9 月 20 日

Revisiting the Characteristics of Stochastic Gradient Noise and Dynamics

翻译：重新审视小渐变噪音和动态的特征

Yixin Wu,Rui Luo,Chen Zhang,Jun Wang,Yaodong Yang

from arxiv, 18 pages

In this paper, we characterize the noise of stochastic gradients and analyze the noise-induced dynamics during training deep neural networks by gradient-based optimizers. Specifically, we firstly show that the stochastic gradient noise possesses finite variance, and therefore the classical Central Limit Theorem (CLT) applies; this indicates that the gradient noise is asymptotically Gaussian. Such an asymptotic result validates the wide-accepted assumption of Gaussian noise. We clarify that the recently observed phenomenon of heavy tails within gradient noise may not be intrinsic properties, but the consequence of insufficient mini-batch size; the gradient noise, which is a sum of limited i.i.d. random variables, has not reached the asymptotic regime of CLT, thus deviates from Gaussian. We quantitatively measure the goodness of Gaussian approximation of the noise, which supports our conclusion. Secondly, we analyze the noise-induced dynamics of stochastic gradient descent using the Langevin equation, granting for momentum hyperparameter in the optimizer with a physical interpretation. We then proceed to demonstrate the existence of the steady-state distribution of stochastic gradient descent and approximate the distribution at a small learning rate.

翻译：在本文中,我们用基于梯度的优化优化器对深神经网络培训过程中的静默梯度噪声进行定性分析,并分析在以梯度为基础的优化器对深神经网络进行培训过程中出现的噪音诱发的动态。具体地说,我们首先表明,静态梯度梯度噪声具有一定差异,因此适用经典中央限值理论(CLT);这表明,梯度噪声是非静态的。这种无静态结果验证了高山噪声这一广泛接受的假设。我们澄清,最近观察到的梯度噪声中的重尾巴现象可能不是内在特性,而是由于微缩缩缩缩小的尺寸造成的;梯度噪声,是有限的i.d.随机变量之和,尚未达到CLT的静态定值体系,因此与Gausian不同。我们量化测量高斯噪音近似的美度,这支持我们的结论。第二,我们利用兰氏方程式分析由噪音引起的梯度梯度梯度梯度下降的动态动态动态,允许在优化器中形成动力超常分数计,然后以物理判分度速度的基度分布,然后开始以稳定地向低度分布。

0

相关内容

分布外泛化(Out-Of-Distribution Generalization) 综述论文，22页pdf240篇文献

专知会员服务

64+阅读 · 2021年9月2日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【NIPS2019】Infidelity and Sensitivity：模型可解释性方法的定量评估

【NIPS2019】Infidelity and Sensitivity：模型可解释性方法的定量评估

AINLP

19+阅读 · 2020年6月14日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Sensitivity analysis of Wasserstein distributionally robust optimization problems

Arxiv

0+阅读 · 2021年11月12日

Differential privacy and robust statistics in high dimensions

Arxiv

0+阅读 · 2021年11月12日

Online Statistical Inference for Stochastic Optimization via Kiefer-Wolfowitz Methods

Arxiv

1+阅读 · 2021年11月11日

Distributionally Robust Trajectory Optimization Under Uncertain Dynamics via Relative Entropy Trust-Regions

Arxiv

0+阅读 · 2021年11月11日

Characterizing possible failure modes in physics-informed neural networks

Arxiv

0+阅读 · 2021年11月11日

Stationary Behavior of Constant Stepsize SGD Type Algorithms: An Asymptotic Characterization

Stationary Behavior of Constant Stepsize SGD Type Algorithms: An Asymptotic Characterization

Arxiv

0+阅读 · 2021年11月11日

Model-Based Reinforcement Learning for Stochastic Hybrid Systems

Model-Based Reinforcement Learning for Stochastic Hybrid Systems

Arxiv

0+阅读 · 2021年11月11日

Blind Image Super-resolution with Elaborate Degradation Modeling on Noise and Kernel

Arxiv

0+阅读 · 2021年11月11日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Thermodynamics and Feature Extraction by Machine Learning

Arxiv

3+阅读 · 2018年10月18日

VIP会员

文章信息

相关主题

随机梯度下降

相关VIP内容

分布外泛化(Out-Of-Distribution Generalization) 综述论文，22页pdf240篇文献

专知会员服务

64+阅读 · 2021年9月2日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

【NIPS2019】Infidelity and Sensitivity：模型可解释性方法的定量评估

【NIPS2019】Infidelity and Sensitivity：模型可解释性方法的定量评估

AINLP

19+阅读 · 2020年6月14日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Sensitivity analysis of Wasserstein distributionally robust optimization problems

Arxiv

0+阅读 · 2021年11月12日

Differential privacy and robust statistics in high dimensions

Arxiv

0+阅读 · 2021年11月12日

Online Statistical Inference for Stochastic Optimization via Kiefer-Wolfowitz Methods

Arxiv

1+阅读 · 2021年11月11日

Distributionally Robust Trajectory Optimization Under Uncertain Dynamics via Relative Entropy Trust-Regions

Arxiv

0+阅读 · 2021年11月11日

Characterizing possible failure modes in physics-informed neural networks

Arxiv

0+阅读 · 2021年11月11日

Stationary Behavior of Constant Stepsize SGD Type Algorithms: An Asymptotic Characterization

Stationary Behavior of Constant Stepsize SGD Type Algorithms: An Asymptotic Characterization

Arxiv

0+阅读 · 2021年11月11日

Model-Based Reinforcement Learning for Stochastic Hybrid Systems

Model-Based Reinforcement Learning for Stochastic Hybrid Systems

Arxiv

0+阅读 · 2021年11月11日

Blind Image Super-resolution with Elaborate Degradation Modeling on Noise and Kernel

Arxiv

0+阅读 · 2021年11月11日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Thermodynamics and Feature Extraction by Machine Learning

Arxiv

3+阅读 · 2018年10月18日

微信扫码咨询专知VIP会员