SGD 5 永不逃离摊接点 (SGD May Never Escape Saddle Points) - 专知论文

会员服务 ·

0

SGD · 局部极大点 · 鞍点 · Neural Networks · 样例 ·

2021 年 9 月 22 日

SGD May Never Escape Saddle Points

翻译：SGD 5 永不逃离摊接点

Liu Ziyin,Botao Li,James B. Simon,Masahito Ueda

from arxiv, Typoes fixed

Stochastic gradient descent (SGD) has been deployed to solve highly non-linear and non-convex machine learning problems such as the training of deep neural networks. However, previous works on SGD often rely on restrictive and unrealistic assumptions about the nature of noise in SGD. In this work, we mathematically construct examples that defy previous understandings of SGD. For example, our constructions show that: (1) SGD may converge to a local maximum; (2) SGD may escape a saddle point arbitrarily slowly; (3) SGD may prefer sharp minima over the flat ones; and (4) AMSGrad may converge to a local maximum. We also show the relevance of our result to deep learning by presenting a minimal neural network example. Our result suggests that the noise structure of SGD might be more important than the loss landscape in neural network training and that future research should focus on deriving the actual noise structure in deep learning.

翻译：然而,以前关于SGD的工作往往依赖于对SGD噪音性质的限制性和不现实的假设。在这项工作中,我们用数学来构建一些与以前对SGD的理解相悖的例子。例如,我们的建筑表明:(1) SGD可能与当地的最高值趋同;(2) SGD可能任意地缓慢地逃过一个马鞍点;(3) SGD可能更喜欢尖锐的迷你,而不是平坦的;(4) AMSGrad可能与当地的最高值趋同。我们还通过提出一个最小的神经网络范例,表明我们的成果与深层次的学习的相关性。我们的结果表明,SGD的噪音结构可能比神经网络培训中丢失的场景更为重要,而且未来的研究应当侧重于深层学习中的实际噪音结构。

0

相关内容

SGD

【ICML2021】动量残差神经网络

专知会员服务

31+阅读 · 2021年7月19日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

Yoshua Bengio，使算法知道“为什么”

Yoshua Bengio，使算法知道“为什么”

专知会员服务

8+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Convergence and Stability of the Stochastic Proximal Point Algorithm with Momentum

Arxiv

1+阅读 · 2021年11月11日

Performance of Bayesian linear regression in a model with mismatch

Arxiv

0+阅读 · 2021年11月10日

Towards Gradient-based Bilevel Optimization with Non-convex Followers and Beyond

Arxiv

5+阅读 · 2021年10月1日

Large Scale Private Learning via Low-rank Reparametrization

Arxiv

5+阅读 · 2021年6月17日

Momentum Residual Neural Networks

Arxiv

7+阅读 · 2021年5月13日

Reinforcement Learning Enhanced Quantum-inspired Algorithm for Combinatorial Optimization

Arxiv

4+阅读 · 2020年2月14日

Risk-Aware Active Inverse Reinforcement Learning

Risk-Aware Active Inverse Reinforcement Learning

Arxiv

8+阅读 · 2019年1月8日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

A Study on Overfitting in Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年4月20日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

VIP会员

文章信息

相关主题

局部极大点

Neural Networks

相关VIP内容

【ICML2021】动量残差神经网络

专知会员服务

31+阅读 · 2021年7月19日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

Yoshua Bengio，使算法知道“为什么”

Yoshua Bengio，使算法知道“为什么”

专知会员服务

8+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

扩散模型中的 Transformer：图像生成及其延展应用询问 ChatGPT

281页pdf《神经网络设计入门》

【普林斯顿博士论文】以奖励推动生成式人工智能的发展：奖励引导生成的理论与方法

中文版 | 火力支援与巡飞弹药的未来（附原文）

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Convergence and Stability of the Stochastic Proximal Point Algorithm with Momentum

Arxiv

1+阅读 · 2021年11月11日

Performance of Bayesian linear regression in a model with mismatch

Arxiv

0+阅读 · 2021年11月10日

Towards Gradient-based Bilevel Optimization with Non-convex Followers and Beyond

Arxiv

5+阅读 · 2021年10月1日

Large Scale Private Learning via Low-rank Reparametrization

Arxiv

5+阅读 · 2021年6月17日

Momentum Residual Neural Networks

Arxiv

7+阅读 · 2021年5月13日

Reinforcement Learning Enhanced Quantum-inspired Algorithm for Combinatorial Optimization

Arxiv

4+阅读 · 2020年2月14日

Risk-Aware Active Inverse Reinforcement Learning

Risk-Aware Active Inverse Reinforcement Learning

Arxiv

8+阅读 · 2019年1月8日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

A Study on Overfitting in Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年4月20日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

微信扫码咨询专知VIP会员