Lojasiewicz-地貌景观的随机梯度梯度梯度梯度梯度下降办法的趋同 (Convergence of stochastic gradient descent schemes for Lojasiewicz-landscapes) - 专知论文

会员服务 ·

0

随机梯度下降 · SGD · Softplus · 评论员 · 泛函 ·

2021 年 3 月 22 日

Convergence of stochastic gradient descent schemes for Lojasiewicz-landscapes

翻译：Lojasiewicz-地貌景观的随机梯度梯度梯度梯度梯度下降办法的趋同

Steffen Dereich,Sebastian Kassing

In this article, we consider convergence of stochastic gradient descent schemes (SGD) under weak assumptions on the underlying landscape. More explicitly, we show that on the event that the SGD stays local we have convergence of the SGD if there is only a countable number of critical points or if the target function/landscape satisfies Lojasiewicz-inequalities around all critical levels as all analytic functions do. In particular, we show that for neural networks with analytic activation function such as softplus, sigmoid and the hyperbolic tangent, SGD converges on the event of staying local, if the random variables modeling the signal and response in the training are compactly supported.

翻译：在本篇文章中,我们考虑在基本景观的薄弱假设下,将随机梯度下降计划(SGD)合并在一起。更明确地说,我们表明,如果SGD保持局部状态,如果只有数量可计的临界点,或者目标函数/地貌与所有分析功能一样在所有关键级别上都满足Lojasiewicz-不平等,我们就会看到SGD的趋同。我们特别表明,对于具有软增、类固和双曲正切等分析引爆功能的神经网络,如果培训中的信号和反应的随机变量得到紧密支持,SGD会集中在当地状态。

0

相关内容

随机梯度下降

随机梯度下降

随机梯度下降，按照数据生成分布抽取m个样本，通过计算他们梯度的平均值来更新梯度。

最新《对抗机器学习》报告，EPFL-Volkan教授讲解AML中的优化问题

最新《对抗机器学习》报告，EPFL-Volkan教授讲解AML中的优化问题

专知会员服务

36+阅读 · 2021年1月14日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【MLSS2020】最新《贝叶斯推断》教程，125页ppt与视频，DeepMind Shakir Mohamed博士

【MLSS2020】最新《贝叶斯推断》教程，125页ppt与视频，DeepMind Shakir Mohamed博士

专知会员服务

119+阅读 · 2020年7月11日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【IPAM 】张量主元分析中的高维成本景观和梯度下降及其推广（High-dimensional cost landscape and gradient descent in Tensor PCA and its generalisations），附41页pdf

【IPAM 】张量主元分析中的高维成本景观和梯度下降及其推广（High-dimensional cost landscape and gradient descent in Tensor PCA and its generalisations），附41页pdf

专知会员服务

13+阅读 · 2019年11月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Innovation Compression for Communication-efficient Distributed Optimization with Linear Convergence

Arxiv

0+阅读 · 2021年5月14日

The Dynamics of Gradient Descent for Overparametrized Neural Networks

Arxiv

0+阅读 · 2021年5月13日

Multiparty testing preorders

Arxiv

0+阅读 · 2021年5月13日

On the Explicit Role of Initialization on the Convergence and Implicit Bias of Overparametrized Linear Networks

On the Explicit Role of Initialization on the Convergence and Implicit Bias of Overparametrized Linear Networks

Arxiv

0+阅读 · 2021年5月13日

Value-at-Risk Optimization with Gaussian Processes

Arxiv

0+阅读 · 2021年5月13日

Leveraging Non-uniformity in First-order Non-convex Optimization

Arxiv

0+阅读 · 2021年5月13日

Neural Architecture Generator Optimization

Arxiv

6+阅读 · 2020年10月8日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Asynchronous Byzantine Machine Learning (the case of SGD)

Arxiv

3+阅读 · 2018年7月9日

VIP会员

文章信息

相关主题

随机梯度下降

相关VIP内容

最新《对抗机器学习》报告，EPFL-Volkan教授讲解AML中的优化问题

最新《对抗机器学习》报告，EPFL-Volkan教授讲解AML中的优化问题

专知会员服务

36+阅读 · 2021年1月14日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【MLSS2020】最新《贝叶斯推断》教程，125页ppt与视频，DeepMind Shakir Mohamed博士

【MLSS2020】最新《贝叶斯推断》教程，125页ppt与视频，DeepMind Shakir Mohamed博士

专知会员服务

119+阅读 · 2020年7月11日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【IPAM 】张量主元分析中的高维成本景观和梯度下降及其推广（High-dimensional cost landscape and gradient descent in Tensor PCA and its generalisations），附41页pdf

【IPAM 】张量主元分析中的高维成本景观和梯度下降及其推广（High-dimensional cost landscape and gradient descent in Tensor PCA and its generalisations），附41页pdf

专知会员服务

13+阅读 · 2019年11月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Innovation Compression for Communication-efficient Distributed Optimization with Linear Convergence

Arxiv

0+阅读 · 2021年5月14日

The Dynamics of Gradient Descent for Overparametrized Neural Networks

Arxiv

0+阅读 · 2021年5月13日

Multiparty testing preorders

Arxiv

0+阅读 · 2021年5月13日

On the Explicit Role of Initialization on the Convergence and Implicit Bias of Overparametrized Linear Networks

On the Explicit Role of Initialization on the Convergence and Implicit Bias of Overparametrized Linear Networks

Arxiv

0+阅读 · 2021年5月13日

Value-at-Risk Optimization with Gaussian Processes

Arxiv

0+阅读 · 2021年5月13日

Leveraging Non-uniformity in First-order Non-convex Optimization

Arxiv

0+阅读 · 2021年5月13日

Neural Architecture Generator Optimization

Arxiv

6+阅读 · 2020年10月8日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Asynchronous Byzantine Machine Learning (the case of SGD)

Arxiv

3+阅读 · 2018年7月9日

微信扫码咨询专知VIP会员