全球碎裂后裔的趋同与稳定 (Global Convergence and Stability of Stochastic Gradient Descent) - 专知论文

会员服务 ·

0

随机梯度下降 · SGD · 噪声 · MoDELS · 发散 ·

2021 年 10 月 4 日

Global Convergence and Stability of Stochastic Gradient Descent

翻译：全球碎裂后裔的趋同与稳定

Vivak Patel,Bowen Tian,Shushu Zhang

In machine learning, stochastic gradient descent (SGD) is widely deployed to train models using highly non-convex objectives with equally complex noise models. Unfortunately, SGD theory often makes restrictive assumptions that fail to capture the non-convexity of real problems, and almost entirely ignore the complex noise models that exist in practice. In this work, we make substantial progress on this shortcoming. First, we establish that SGD's iterates will either globally converge to a stationary point or diverge under nearly arbitrary nonconvexity and noise models. Under a slightly more restrictive assumption on the joint behavior of the non-convexity and noise model that generalizes current assumptions in the literature, we show that the objective function cannot diverge, even if the iterates diverge. As a consequence of our results, SGD can be applied to a greater range of stochastic optimization problems with confidence about its global convergence behavior and stability.

翻译：在机器学习中,悬浮梯度下降(SGD)被广泛应用到使用高度非混凝土目标的模型上,使用同样复杂的噪音模型。不幸的是,SGD理论常常作出限制性假设,无法捕捉实际问题的非混杂性,几乎完全忽视实践中存在的复杂的噪音模型。在这项工作中,我们在这一缺陷上取得了长足的进展。首先,我们确定SGD的迭代将在全球范围汇集到一个固定点上,或者在几乎任意的非混凝土和噪音模型下出现差异。在对非混凝土和噪音模型的共同行为略加限制的假设中,我们显示,即使这些假设相互偏差,目标功能也不可能不同。由于我们的结果,SGD可以被应用到更多关于其全球趋同行为和稳定性的信心的随机优化问题上。

0

相关内容

随机梯度下降

随机梯度下降

随机梯度下降，按照数据生成分布抽取m个样本，通过计算他们梯度的平均值来更新梯度。

【Cell】神经算法推理，Neural algorithmic reasoning

【Cell】神经算法推理，Neural algorithmic reasoning

专知会员服务

29+阅读 · 2021年7月16日

【NeurIPS 2020】生成对抗性模仿学习的f-Divergence

【NeurIPS 2020】生成对抗性模仿学习的f-Divergence

专知会员服务

26+阅读 · 2020年10月9日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

最新《医学图像深度语义分割》综述论文

最新《医学图像深度语义分割》综述论文

专知会员服务

97+阅读 · 2020年6月7日

【论文推荐】Stochastic Graph Neural Networks，随机图神经网络

【论文推荐】Stochastic Graph Neural Networks，随机图神经网络

专知会员服务

69+阅读 · 2020年6月6日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

Yoshua Bengio，使算法知道“为什么”

Yoshua Bengio，使算法知道“为什么”

专知会员服务

8+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

目标检测中的Consistent Optimization

目标检测中的Consistent Optimization

极市平台

6+阅读 · 2019年4月23日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

多高的AUC才算高？

多高的AUC才算高？

ResysChina

7+阅读 · 2016年12月7日

Convergence and Stability of the Stochastic Proximal Point Algorithm with Momentum

Arxiv

0+阅读 · 2021年12月10日

Continuation Path with Linear Convergence Rate

Arxiv

0+阅读 · 2021年12月9日

Policy Optimization with Stochastic Mirror Descent

Arxiv

0+阅读 · 2021年12月9日

A Continuous-time Stochastic Gradient Descent Method for Continuous Data

Arxiv

0+阅读 · 2021年12月7日

A Novel Convergence Analysis for Algorithms of the Adam Family

Arxiv

0+阅读 · 2021年12月7日

Gradient play in stochastic games: stationary points, convergence, and sample complexity

Arxiv

0+阅读 · 2021年12月6日

Stochastic Iterative Graph Matching

Arxiv

6+阅读 · 2021年6月4日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

VIP会员

文章信息

相关主题

随机梯度下降

相关VIP内容

【Cell】神经算法推理，Neural algorithmic reasoning

【Cell】神经算法推理，Neural algorithmic reasoning

专知会员服务

29+阅读 · 2021年7月16日

【NeurIPS 2020】生成对抗性模仿学习的f-Divergence

【NeurIPS 2020】生成对抗性模仿学习的f-Divergence

专知会员服务

26+阅读 · 2020年10月9日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

最新《医学图像深度语义分割》综述论文

最新《医学图像深度语义分割》综述论文

专知会员服务

97+阅读 · 2020年6月7日

【论文推荐】Stochastic Graph Neural Networks，随机图神经网络

【论文推荐】Stochastic Graph Neural Networks，随机图神经网络

专知会员服务

69+阅读 · 2020年6月6日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

Yoshua Bengio，使算法知道“为什么”

Yoshua Bengio，使算法知道“为什么”

专知会员服务

8+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

目标检测中的Consistent Optimization

目标检测中的Consistent Optimization

极市平台

6+阅读 · 2019年4月23日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

多高的AUC才算高？

多高的AUC才算高？

ResysChina

7+阅读 · 2016年12月7日

相关论文

Convergence and Stability of the Stochastic Proximal Point Algorithm with Momentum

Arxiv

0+阅读 · 2021年12月10日

Continuation Path with Linear Convergence Rate

Arxiv

0+阅读 · 2021年12月9日

Policy Optimization with Stochastic Mirror Descent

Arxiv

0+阅读 · 2021年12月9日

A Continuous-time Stochastic Gradient Descent Method for Continuous Data

Arxiv

0+阅读 · 2021年12月7日

A Novel Convergence Analysis for Algorithms of the Adam Family

Arxiv

0+阅读 · 2021年12月7日

Gradient play in stochastic games: stationary points, convergence, and sample complexity

Arxiv

0+阅读 · 2021年12月6日

Stochastic Iterative Graph Matching

Arxiv

6+阅读 · 2021年6月4日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

微信扫码咨询专知VIP会员