具有国家依赖性噪音的 Stopchatic 渐变源动态 (Dynamic of Stochastic Gradient Descent with State-Dependent Noise) - 专知论文

会员服务 ·

0

极小值 · SGD · 尖锐最小值 · 平坦最小值 · 随机梯度下降 ·

2020 年 10 月 12 日

Dynamic of Stochastic Gradient Descent with State-Dependent Noise

翻译：具有国家依赖性噪音的 Stopchatic 渐变源动态

Qi Meng,Shiqi Gong,Wei Chen,Zhi-Ming Ma,Tie-Yan Liu

Stochastic gradient descent (SGD) and its variants are mainstream methods to train deep neural networks. Since neural networks are non-convex, more and more works study the dynamic behavior of SGD and the impact to its generalization, especially the escaping efficiency from local minima. However, these works take the over-simplified assumption that the covariance of the noise in SGD is (or can be upper bounded by) constant, although it is actually state-dependent. In this work, we conduct a formal study on the dynamic behavior of SGD with state-dependent noise. Specifically, we show that the covariance of the noise of SGD in the local region of the local minima is a quadratic function of the state. Thus, we propose a novel power-law dynamic with state-dependent diffusion to approximate the dynamic of SGD. We prove that, power-law dynamic can escape from sharp minima exponentially faster than flat minima, while the previous dynamics can only escape sharp minima polynomially faster than flat minima. Our experiments well verified our theoretical results. Inspired by our theory, we propose to add additional state-dependent noise into (large-batch) SGD to further improve its generalization ability. Experiments verify that our method is effective.

翻译：电磁梯度下降及其变种是培养深神经网络的主流方法。由于神经网络是非隐形的,越来越多的工程研究SGD的动态行为及其对其一般化的影响,特别是从本地迷你中逃脱效率。然而,这些工程采用了过于简单化的假设,即SGD中噪音的共变是常态的(或可以被上层圈套的),尽管它实际上取决于各州。在这项工作中,我们对SGD的动态行为进行了正式研究,以国家为主的噪音为主。具体地说,我们表明SGD在本地迷你马地区噪音的共变异性是国家的一个二次函数。因此,我们提出了一个新的权力法动态,以国家为主的传播为主,以近似SGD的动态。我们证明,电法动态能够从尖锐的迷你马中快速逃脱,而前一种动态只能比平流迷你迷你马快速逃脱。我们的实验很好地证实了我们的理论结果。根据我们的理论,我们提议,我们进一步的实验将它推导,我们更能地将其推算成新的SGD。

0

相关内容

极小值

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

58+阅读 · 2020年11月21日

「NeurIPS 2020」基于局部子图的图元学习

专知会员服务

46+阅读 · 2020年10月22日

【NeurIPS 2020】耶鲁大学等提出「AdaBelief」的新型优化器，速度快，训练稳，泛化强

专知会员服务

18+阅读 · 2020年10月19日

《小样本元学习》2020最新综述论文

《小样本元学习》2020最新综述论文

专知会员服务

173+阅读 · 2020年7月31日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

非凸优化与统计学，89页ppt，普林斯顿Yuxin Chen博士

非凸优化与统计学，89页ppt，普林斯顿Yuxin Chen博士

专知会员服务

103+阅读 · 2020年6月28日

【普林斯顿大学-微软】加权元学习，Weighted Meta-Learning

【普林斯顿大学-微软】加权元学习，Weighted Meta-Learning

专知会员服务

40+阅读 · 2020年3月25日

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

专知会员服务

235+阅读 · 2020年1月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

误差反向传播——RNN

误差反向传播——RNN

统计学习与视觉计算组

18+阅读 · 2018年9月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Stochastic Gradient Descent with Nonlinear Conjugate Gradient-Style Adaptive Momentum

Arxiv

0+阅读 · 2020年12月3日

SSGD: A safe and efficient method of gradient descent

Arxiv

0+阅读 · 2020年12月3日

The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning

Arxiv

0+阅读 · 2020年12月3日

rTop-k: A Statistical Estimation Approach to Distributed SGD

Arxiv

0+阅读 · 2020年12月2日

The duality structure gradient descent algorithm: analysis and applications to neural networks

Arxiv

0+阅读 · 2020年12月2日

Discretization of a distributed control problem with a stochastic parabolic equation driven by multiplicative noise

Arxiv

0+阅读 · 2020年11月30日

Dynamic Regret of Convex and Smooth Functions

Arxiv

0+阅读 · 2020年11月28日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

VIP会员

文章信息

相关主题

尖锐最小值

平坦最小值

随机梯度下降

相关VIP内容

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

58+阅读 · 2020年11月21日

「NeurIPS 2020」基于局部子图的图元学习

专知会员服务

46+阅读 · 2020年10月22日

【NeurIPS 2020】耶鲁大学等提出「AdaBelief」的新型优化器，速度快，训练稳，泛化强

专知会员服务

18+阅读 · 2020年10月19日

《小样本元学习》2020最新综述论文

《小样本元学习》2020最新综述论文

专知会员服务

173+阅读 · 2020年7月31日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

非凸优化与统计学，89页ppt，普林斯顿Yuxin Chen博士

非凸优化与统计学，89页ppt，普林斯顿Yuxin Chen博士

专知会员服务

103+阅读 · 2020年6月28日

【普林斯顿大学-微软】加权元学习，Weighted Meta-Learning

【普林斯顿大学-微软】加权元学习，Weighted Meta-Learning

专知会员服务

40+阅读 · 2020年3月25日

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

专知会员服务

235+阅读 · 2020年1月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

误差反向传播——RNN

误差反向传播——RNN

统计学习与视觉计算组

18+阅读 · 2018年9月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

Stochastic Gradient Descent with Nonlinear Conjugate Gradient-Style Adaptive Momentum

Arxiv

0+阅读 · 2020年12月3日

SSGD: A safe and efficient method of gradient descent

Arxiv

0+阅读 · 2020年12月3日

The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning

Arxiv

0+阅读 · 2020年12月3日

rTop-k: A Statistical Estimation Approach to Distributed SGD

Arxiv

0+阅读 · 2020年12月2日

The duality structure gradient descent algorithm: analysis and applications to neural networks

Arxiv

0+阅读 · 2020年12月2日

Discretization of a distributed control problem with a stochastic parabolic equation driven by multiplicative noise

Arxiv

0+阅读 · 2020年11月30日

Dynamic Regret of Convex and Smooth Functions

Arxiv

0+阅读 · 2020年11月28日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

微信扫码咨询专知VIP会员