使用tochastic 控有斯托卡梯级法解置套接点 (Escaping Saddle Points with Stochastically Controlled Stochastic Gradient Methods) - 专知论文

会员服务 ·

0

鞍点 · SGD · 驻点 · 噪声 · 平稳的 ·

2021 年 4 月 23 日

Escaping Saddle Points with Stochastically Controlled Stochastic Gradient Methods

翻译：使用tochastic 控有斯托卡梯级法解置套接点

Guannan Liang,Qianqian Tong,Chunjiang Zhu,Jinbo Bi

Stochastically controlled stochastic gradient (SCSG) methods have been proved to converge efficiently to first-order stationary points which, however, can be saddle points in nonconvex optimization. It has been observed that a stochastic gradient descent (SGD) step introduces anistropic noise around saddle points for deep learning and non-convex half space learning problems, which indicates that SGD satisfies the correlated negative curvature (CNC) condition for these problems. Therefore, we propose to use a separate SGD step to help the SCSG method escape from strict saddle points, resulting in the CNC-SCSG method. The SGD step plays a role similar to noise injection but is more stable. We prove that the resultant algorithm converges to a second-order stationary point with a convergence rate of $\tilde{O}( \epsilon^{-2} log( 1/\epsilon))$ where $\epsilon$ is the pre-specified error tolerance. This convergence rate is independent of the problem dimension, and is faster than that of CNC-SGD. A more general framework is further designed to incorporate the proposed CNC-SCSG into any first-order method for the method to escape saddle points. Simulation studies illustrate that the proposed algorithm can escape saddle points in much fewer epochs than the gradient descent methods perturbed by either noise injection or a SGD step.

翻译：已经证明,对沙丘控制的沙丘梯度(SCSG)方法能够有效地与一阶固定点相趋同,然而,这些固定点可能是非电流优化的支撑点,据观察,一个随机梯度梯度梯度梯度梯度(SGD)步骤在马鞍点周围造成厌食性噪音,造成深层学习和非电流半空学习问题,这表明SGD满足了与这些问题相关的负曲线(CNC)条件。因此,我们提议采用单独的SGD步骤,帮助SCGD方法从严格的马鞍点逃出,从而形成CNC-SC-SCG方法。SGD步骤的作用类似于噪音注入,但更稳定。我们证明,一个结果的算法将二阶固定点趋同到第二阶点,其趋同率为$tilde{O}(\epsilon ⁇ -2}log (1/\epsilon)),其中美元是预先确定的错误容忍度。我们提议采用这种趋同率与问题层面无关,而且比CNC-SG-SG方法第一步更快。一个较一般的框架用于Smarelma 方向的越轨方法。

0

相关内容

在数学中，鞍点或极大极小点是函数图形表面上的一点，其正交方向上的斜率(导数)都为零，但它不是函数的局部极值。鞍点是在某一轴向(峰值之间)有一个相对最小的临界点，在交叉轴上有一个相对最大的临界点。

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【Google-普林斯顿】从学习速率中解开自适应梯度法，Disentangling Adaptive Gradient

专知会员服务

19+阅读 · 2020年3月5日

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

专知会员服务

44+阅读 · 2019年10月28日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

深度学习优化算法入门：二、动量、RMSProp、Adam

深度学习优化算法入门：二、动量、RMSProp、Adam

论智

10+阅读 · 2018年10月2日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

Distributed Saddle-Point Problems: Lower Bounds, Optimal and Robust Algorithms

Arxiv

0+阅读 · 2021年6月14日

Decentralized Distributed Optimization for Saddle Point Problems

Arxiv

0+阅读 · 2021年6月14日

Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent

Arxiv

0+阅读 · 2021年6月11日

A Decentralized Adaptive Momentum Method for Solving a Class of Min-Max Optimization Problems

Arxiv

0+阅读 · 2021年6月10日

Parameter and Feature Selection in Stochastic Linear Bandits

Arxiv

0+阅读 · 2021年6月9日

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

Arxiv

0+阅读 · 2021年6月6日

Minibatch and Momentum Model-based Methods for Stochastic Non-smooth Non-convex Optimization

Arxiv

0+阅读 · 2021年6月6日

Escaping Saddle Points Faster with Stochastic Momentum

Arxiv

0+阅读 · 2021年6月5日

Bandwidth-based Step-Sizes for Non-Convex Stochastic Optimization

Arxiv

0+阅读 · 2021年6月5日

Cascading Bandit under Differential Privacy

Arxiv

0+阅读 · 2021年6月4日

VIP会员

文章信息

相关主题

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【Google-普林斯顿】从学习速率中解开自适应梯度法，Disentangling Adaptive Gradient

专知会员服务

19+阅读 · 2020年3月5日

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

专知会员服务

44+阅读 · 2019年10月28日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】《知识图谱与大语言模型的协同应用》，544页pdf

军事通信系统：安全行动的支柱

《缓解大语言模型（LLMs）幻觉：面向应用的检索增强生成（RAG）、推理与智能体系统综述》

【新书】机器学习系统，2620页pdf

相关资讯

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

深度学习优化算法入门：二、动量、RMSProp、Adam

深度学习优化算法入门：二、动量、RMSProp、Adam

论智

10+阅读 · 2018年10月2日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

相关论文

Distributed Saddle-Point Problems: Lower Bounds, Optimal and Robust Algorithms

Arxiv

0+阅读 · 2021年6月14日

Decentralized Distributed Optimization for Saddle Point Problems

Arxiv

0+阅读 · 2021年6月14日

Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent

Arxiv

0+阅读 · 2021年6月11日

A Decentralized Adaptive Momentum Method for Solving a Class of Min-Max Optimization Problems

Arxiv

0+阅读 · 2021年6月10日

Parameter and Feature Selection in Stochastic Linear Bandits

Arxiv

0+阅读 · 2021年6月9日

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

Arxiv

0+阅读 · 2021年6月6日

Minibatch and Momentum Model-based Methods for Stochastic Non-smooth Non-convex Optimization

Arxiv

0+阅读 · 2021年6月6日

Escaping Saddle Points Faster with Stochastic Momentum

Arxiv

0+阅读 · 2021年6月5日

Bandwidth-based Step-Sizes for Non-Convex Stochastic Optimization

Arxiv

0+阅读 · 2021年6月5日

Cascading Bandit under Differential Privacy

Arxiv

0+阅读 · 2021年6月4日

微信扫码咨询专知VIP会员