通用可限制函数连接至最小化器的存储子子梯度 (Stochastic Subgradient Descent on a Generic Definable Function Converges to a Minimizer) - 专知论文

会员服务 ·

0

局部极小 · 评论员 · 极小点 · 泛函 · SGD ·

2021 年 9 月 6 日

Stochastic Subgradient Descent on a Generic Definable Function Converges to a Minimizer

翻译：通用可限制函数连接至最小化器的存储子子梯度

Sholom Schechtman

from arxiv, 35 pages

It was previously shown by Davis and Drusvyatskiy that every Clarke critical point of a generic, semialgebraic (and more generally definable in an o-minimal structure), weakly convex function is lying on an active manifold and is either a local minimum or an active strict saddle. In the first part of this work, we show that when the weak convexity assumption fails a third type of point appears: a sharply repulsive critical point. Moreover, we show that the corresponding active manifolds satisfy the Verdier and the angle conditions which were introduced by us in our previous work. In the second part of this work, we show that, under a density-like assumption on the perturbation sequence, the stochastic subgradient descent (SGD) avoids sharply repulsive critical points with probability one. We show that such a density-like assumption could be obtained upon adding a small random perturbation (e.g. a nondegenerate Gaussian) at each iteration of the algorithm. These results, combined with our previous work on the avoidance of active strict saddles, show that the SGD on a generic definable (e.g. semialgebraic) function converges to a local minimum.

翻译：Davis和Drusvyatskiy曾指出,一个通用的、半成形的(而且更一般地在微微结构中可定义的)微软锥形函数的每个克拉克临界点都位于一个活跃的元件上,它或是一个局部的最低限度,或是一个活跃的严格马鞍。在这项工作的第一部分,我们表明,当薄弱的凝固性假设未能达到第三类点时,就会出现一个明显令人厌恶的临界点。此外,我们还表明,相应的活性元体满足了我们先前工作中引入的Verdier和角度条件。在这项工作的第二部分,我们表明,在对扰动序列进行一个类似密度的假设时,在静态的次梯位下降(SGD)避免了剧烈的令人厌恶的临界点,而概率为1。我们表明,在每次迭代算法中添加一个小的随机扰动(例如,一个不退化的高斯)时,可以得出这种密度相似的假设。这些结果,加上我们先前关于避免积极严格马鞍的工作,将显示,SGDGD在最小的局部定义上显示,SGGDGDGD可达到一个可达到一个最小的最小的当地。

0

相关内容

局部极小

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

生成式对抗网络先验贝叶斯推断，Bayesian Inference with Generative Adversarial Network Priors

生成式对抗网络先验贝叶斯推断，Bayesian Inference with Generative Adversarial Network Priors

专知会员服务

28+阅读 · 2020年2月18日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

专知会员服务

208+阅读 · 2019年9月30日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

已删除

将门创投

5+阅读 · 2018年11月15日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Implicit Regularization in Matrix Sensing via Mirror Descent

Arxiv

0+阅读 · 2021年10月27日

A Continuized View on Nesterov Acceleration for Stochastic Gradient Descent and Randomized Gossip

Arxiv

0+阅读 · 2021年10月27日

GIBBON: General-purpose Information-Based Bayesian OptimisatioN

Arxiv

0+阅读 · 2021年10月26日

Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models

Arxiv

0+阅读 · 2021年10月26日

Componentwise perturbation analysis for the generalized Schurdecomposition

Arxiv

0+阅读 · 2021年10月25日

On the Double Descent of Random Features Models Trained with SGD

Arxiv

0+阅读 · 2021年10月24日

The BDF3/EP3 scheme for MBE with no slope selection is stable

Arxiv

0+阅读 · 2021年10月23日

Conditioning of Random Feature Matrices: Double Descent and Generalization Error

Arxiv

0+阅读 · 2021年10月21日

Towards Noise-adaptive, Problem-adaptive Stochastic Gradient Descent

Arxiv

0+阅读 · 2021年10月21日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

VIP会员

文章信息

相关主题

相关VIP内容

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

生成式对抗网络先验贝叶斯推断，Bayesian Inference with Generative Adversarial Network Priors

生成式对抗网络先验贝叶斯推断，Bayesian Inference with Generative Adversarial Network Priors

专知会员服务

28+阅读 · 2020年2月18日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

专知会员服务

208+阅读 · 2019年9月30日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津大学博士论文】将序列结构与几何结构融入深度神经网络

工程视角：影响战争进程的小型无人机

企业级AI应用开发：从技术选型到生产落地

AI生成代码缺陷综述

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

已删除

将门创投

5+阅读 · 2018年11月15日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Implicit Regularization in Matrix Sensing via Mirror Descent

Arxiv

0+阅读 · 2021年10月27日

A Continuized View on Nesterov Acceleration for Stochastic Gradient Descent and Randomized Gossip

Arxiv

0+阅读 · 2021年10月27日

GIBBON: General-purpose Information-Based Bayesian OptimisatioN

Arxiv

0+阅读 · 2021年10月26日

Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models

Arxiv

0+阅读 · 2021年10月26日

Componentwise perturbation analysis for the generalized Schurdecomposition

Arxiv

0+阅读 · 2021年10月25日

On the Double Descent of Random Features Models Trained with SGD

Arxiv

0+阅读 · 2021年10月24日

The BDF3/EP3 scheme for MBE with no slope selection is stable

Arxiv

0+阅读 · 2021年10月23日

Conditioning of Random Feature Matrices: Double Descent and Generalization Error

Arxiv

0+阅读 · 2021年10月21日

Towards Noise-adaptive, Problem-adaptive Stochastic Gradient Descent

Arxiv

0+阅读 · 2021年10月21日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

微信扫码咨询专知VIP会员