通用可限制函数连接至最小化器的存储子子梯度 (Stochastic Subgradient Descent on a Generic Definable Function Converges to a Minimizer) - 专知论文

会员服务 ·

0

局部极小 · 评论员 · 极小点 · 泛函 · SGD ·

2021 年 10 月 1 日

Stochastic Subgradient Descent on a Generic Definable Function Converges to a Minimizer

翻译：通用可限制函数连接至最小化器的存储子子梯度

Sholom Schechtman

from arxiv, 36 pages, 2 figures

It was previously shown by Davis and Drusvyatskiy that every Clarke critical point of a generic, semialgebraic (and more generally definable in an o-minimal structure), weakly convex function is lying on an active manifold and is either a local minimum or an active strict saddle. In the first part of this work, we show that when the weak convexity assumption fails a third type of point appears: a sharply repulsive critical point. Moreover, we show that the corresponding active manifolds satisfy the Verdier and the angle conditions which were introduced by us in our previous work. In the second part of this work, we show that, under a density-like assumption on the perturbation sequence, the stochastic subgradient descent (SGD) avoids sharply repulsive critical points with probability one. We show that such a density-like assumption could be obtained upon adding a small random perturbation (e.g. a nondegenerate Gaussian) at each iteration of the algorithm. These results, combined with our previous work on the avoidance of active strict saddles, show that the SGD on a generic definable (e.g. semialgebraic) function converges to a local minimum.

翻译：Davis和Drusvyatskiy曾指出,一个通用的、半成形的(而且更一般地在微微结构中可定义的)微软锥形函数的每个克拉克临界点都位于一个活跃的元件上,它或是一个局部的最低限度,或是一个活跃的严格马鞍。在这项工作的第一部分,我们表明,当薄弱的凝固性假设未能达到第三类点时,就会出现一个明显令人厌恶的临界点。此外,我们还表明,相应的活性元体满足了我们先前工作中引入的Verdier和角度条件。在这项工作的第二部分,我们表明,在对扰动序列进行一个类似密度的假设时,在静态的次梯位下降(SGD)避免了剧烈的令人厌恶的临界点,而概率为1。我们表明,在每次迭代算法中添加一个小的随机扰动(例如,一个不退化的高斯)时,可以得出这种密度相似的假设。这些结果,加上我们先前关于避免积极严格马鞍的工作,将显示,SGDGD在最小的局部定义上显示,SGGDGDGD可达到一个可达到一个最小的最小的当地。

0

相关内容

局部极小

【Google-Marco Cuturi】最优传输，339页ppt，Optimal Transport

【Google-Marco Cuturi】最优传输，339页ppt，Optimal Transport

专知会员服务

48+阅读 · 2021年10月26日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【AAAI2020】拓扑贝叶斯优化与持久性图：Topological Bayesian Optimization with Persistence Diagrams

【AAAI2020】拓扑贝叶斯优化与持久性图：Topological Bayesian Optimization with Persistence Diagrams

专知会员服务

11+阅读 · 2020年1月17日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

已删除

将门创投

4+阅读 · 2019年4月1日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Stochastic optimization under distributional drift

Arxiv

0+阅读 · 2021年11月25日

Randomized Stochastic Gradient Descent Ascent

Arxiv

0+阅读 · 2021年11月25日

A Simple Optimal Contention Resolution Scheme for Uniform Matroids

Arxiv

0+阅读 · 2021年11月25日

Finite-Time Error Bounds for Distributed Linear Stochastic Approximation

Arxiv

0+阅读 · 2021年11月24日

Convergence of gradient descent for learning linear neural networks

Convergence of gradient descent for learning linear neural networks

Arxiv

0+阅读 · 2021年11月24日

A topology optimisation of acoustic devices based on the frequency response estimation with the Padé approximation

Arxiv

0+阅读 · 2021年11月24日

Critical initialization of wide and deep neural networks through partial Jacobians: general theory and applications to LayerNorm

Arxiv

0+阅读 · 2021年11月23日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Convergence Rates of Latent Topic Models Under Relaxed Identifiability Conditions

Arxiv

3+阅读 · 2018年3月17日

VIP会员

文章信息

相关主题

相关VIP内容

【Google-Marco Cuturi】最优传输，339页ppt，Optimal Transport

【Google-Marco Cuturi】最优传输，339页ppt，Optimal Transport

专知会员服务

48+阅读 · 2021年10月26日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【AAAI2020】拓扑贝叶斯优化与持久性图：Topological Bayesian Optimization with Persistence Diagrams

【AAAI2020】拓扑贝叶斯优化与持久性图：Topological Bayesian Optimization with Persistence Diagrams

专知会员服务

11+阅读 · 2020年1月17日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】《知识图谱与大语言模型的协同应用》，544页pdf

军事通信系统：安全行动的支柱

《缓解大语言模型（LLMs）幻觉：面向应用的检索增强生成（RAG）、推理与智能体系统综述》

【新书】机器学习系统，2620页pdf

相关资讯

已删除

将门创投

4+阅读 · 2019年4月1日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Stochastic optimization under distributional drift

Arxiv

0+阅读 · 2021年11月25日

Randomized Stochastic Gradient Descent Ascent

Arxiv

0+阅读 · 2021年11月25日

A Simple Optimal Contention Resolution Scheme for Uniform Matroids

Arxiv

0+阅读 · 2021年11月25日

Finite-Time Error Bounds for Distributed Linear Stochastic Approximation

Arxiv

0+阅读 · 2021年11月24日

Convergence of gradient descent for learning linear neural networks

Convergence of gradient descent for learning linear neural networks

Arxiv

0+阅读 · 2021年11月24日

A topology optimisation of acoustic devices based on the frequency response estimation with the Padé approximation

Arxiv

0+阅读 · 2021年11月24日

Critical initialization of wide and deep neural networks through partial Jacobians: general theory and applications to LayerNorm

Arxiv

0+阅读 · 2021年11月23日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Convergence Rates of Latent Topic Models Under Relaxed Identifiability Conditions

Arxiv

3+阅读 · 2018年3月17日

微信扫码咨询专知VIP会员