在类似国际刑内化的条件下更快地摆脱套装-点 (Escaping Saddle-Points Faster under Interpolation-like Conditions) - 专知论文

会员服务 ·

0

ScRN · Oracle · 随机梯度下降 · Better · Next ·

2020 年 9 月 28 日

Escaping Saddle-Points Faster under Interpolation-like Conditions

翻译：在类似国际刑内化的条件下更快地摆脱套装-点

Abhishek Roy,Krishnakumar Balasubramanian,Saeed Ghadimi,Prasant Mohapatra

from arxiv, To appear in NeurIPS, 2020

In this paper, we show that under over-parametrization several standard stochastic optimization algorithms escape saddle-points and converge to local-minimizers much faster. One of the fundamental aspects of over-parametrized models is that they are capable of interpolating the training data. We show that, under interpolation-like assumptions satisfied by the stochastic gradients in an over-parametrization setting, the first-order oracle complexity of Perturbed Stochastic Gradient Descent (PSGD) algorithm to reach an $\epsilon$-local-minimizer, matches the corresponding deterministic rate of $\tilde{\mathcal{O}}(1/\epsilon^{2})$. We next analyze Stochastic Cubic-Regularized Newton (SCRN) algorithm under interpolation-like conditions, and show that the oracle complexity to reach an $\epsilon$-local-minimizer under interpolation-like conditions, is $\tilde{\mathcal{O}}(1/\epsilon^{2.5})$. While this obtained complexity is better than the corresponding complexity of either PSGD, or SCRN without interpolation-like assumptions, it does not match the rate of $\tilde{\mathcal{O}}(1/\epsilon^{1.5})$ corresponding to deterministic Cubic-Regularized Newton method. It seems further Hessian-based interpolation-like assumptions are necessary to bridge this gap. We also discuss the corresponding improved complexities in the zeroth-order settings.

翻译：在本文中, 我们显示, 在过度平衡下, 几个标准的随机优化算法( PSGD) 的一阶或触角复杂度可以达到 $\ epsilon$- local-minimizer, 与 $\ tilde_ mathcal{ (1/\\\ epslon\\\\\2}} $ 过度平衡模型的一个基本方面是它们能够对培训数据进行内插。我们显示, 在过度平衡环境下, 由超平衡梯梯梯度梯度梯度梯度梯度所满足的内推式假设中, 几个标准的随机精密度算法( PSGD) 以达到 $\ epslon$- local_ local- miniticlation( Plation_\\\\\\\\\\\\\\\\\ recialislational) 本地最小最小化的精度算法( ) 。我们接下来分析Sclislus- commlation orizal orization rolation rolation exlation( ) exlation) 和( Clation) commisslation) roclation (cl) rol) rol) rolislislisl) ycol) 。它似乎似乎不是更精确的精度比。

0

相关内容

ScRN

【ICML2020】拉普拉斯正则化小样本学习，Laplacian Regularized Few-Shot Learning

【ICML2020】拉普拉斯正则化小样本学习，Laplacian Regularized Few-Shot Learning

专知会员服务

77+阅读 · 2020年6月28日

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

专知会员服务

5+阅读 · 2020年1月5日

【Thomas G. Dietterich】机器“理解”意味着什么?（What does it mean for a machine to “understand”?）

专知会员服务

9+阅读 · 2020年1月3日

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

专知会员服务

20+阅读 · 2019年12月24日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

已删除

将门创投

14+阅读 · 2019年5月29日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

AAAI2019论文抢鲜看！48篇自然语言处理/计算机视觉/机器学习最新接受论文！

AAAI2019论文抢鲜看！48篇自然语言处理/计算机视觉/机器学习最新接受论文！

专知

11+阅读 · 2018年11月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

[DLdigest-8] 每日一道算法

[DLdigest-8] 每日一道算法

深度学习每日摘要

4+阅读 · 2017年11月2日

Simple and optimal methods for stochastic variational inequalities, I: operator extrapolation

Arxiv

0+阅读 · 2020年11月15日

Implicit bias of gradient-descent: fast convergence rate

Arxiv

0+阅读 · 2020年11月12日

FPT-Algorithms for the l-Matchoid Problem with Linear and Submodular Objectives

Arxiv

0+阅读 · 2020年11月12日

Projection Method for Saddle Points of Energy Functional in $H^{-1}$ Metric

Arxiv

0+阅读 · 2020年11月10日

Analysis of the SORAS domain decomposition preconditioner for non-self-adjoint or indefinite problems

Arxiv

0+阅读 · 2020年11月9日

A new notion of commutativity for the algorithmic Lovász Local Lemma

Arxiv

0+阅读 · 2020年11月8日

Strong convergence of some Euler-type schemes for the finite element discretization of time-fractional SPDE driven by standard and fractional Brownian motion

Arxiv

1+阅读 · 2020年11月7日

Faster Lagrangian-Based Methods in Convex Optimization

Arxiv

0+阅读 · 2020年11月7日

Approximating the discrete time-cost tradeoff problem with bounded depth

Arxiv

0+阅读 · 2020年11月4日

Local Finite Element Approximation of Differential Forms

Arxiv

0+阅读 · 2020年11月1日

VIP会员

文章信息

相关主题

随机梯度下降

相关VIP内容

【ICML2020】拉普拉斯正则化小样本学习，Laplacian Regularized Few-Shot Learning

【ICML2020】拉普拉斯正则化小样本学习，Laplacian Regularized Few-Shot Learning

专知会员服务

77+阅读 · 2020年6月28日

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

专知会员服务

5+阅读 · 2020年1月5日

【Thomas G. Dietterich】机器“理解”意味着什么?（What does it mean for a machine to “understand”?）

专知会员服务

9+阅读 · 2020年1月3日

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

专知会员服务

20+阅读 · 2019年12月24日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《驻地训练手册》美陆军最新72页

《量子隧穿认知神经网络在军民车辆识别与情感分析中的应用》最新论文

俄罗斯对乌克兰无人机作战的战略适应性分析

《美国海岸警卫队2028部队设计执行计划摘要》最新32页

相关资讯

已删除

将门创投

14+阅读 · 2019年5月29日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

AAAI2019论文抢鲜看！48篇自然语言处理/计算机视觉/机器学习最新接受论文！

AAAI2019论文抢鲜看！48篇自然语言处理/计算机视觉/机器学习最新接受论文！

专知

11+阅读 · 2018年11月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

[DLdigest-8] 每日一道算法

[DLdigest-8] 每日一道算法

深度学习每日摘要

4+阅读 · 2017年11月2日

相关论文

Simple and optimal methods for stochastic variational inequalities, I: operator extrapolation

Arxiv

0+阅读 · 2020年11月15日

Implicit bias of gradient-descent: fast convergence rate

Arxiv

0+阅读 · 2020年11月12日

FPT-Algorithms for the l-Matchoid Problem with Linear and Submodular Objectives

Arxiv

0+阅读 · 2020年11月12日

Projection Method for Saddle Points of Energy Functional in $H^{-1}$ Metric

Arxiv

0+阅读 · 2020年11月10日

Analysis of the SORAS domain decomposition preconditioner for non-self-adjoint or indefinite problems

Arxiv

0+阅读 · 2020年11月9日

A new notion of commutativity for the algorithmic Lovász Local Lemma

Arxiv

0+阅读 · 2020年11月8日

Strong convergence of some Euler-type schemes for the finite element discretization of time-fractional SPDE driven by standard and fractional Brownian motion

Arxiv

1+阅读 · 2020年11月7日

Faster Lagrangian-Based Methods in Convex Optimization

Arxiv

0+阅读 · 2020年11月7日

Approximating the discrete time-cost tradeoff problem with bounded depth

Arxiv

0+阅读 · 2020年11月4日

Local Finite Element Approximation of Differential Forms

Arxiv

0+阅读 · 2020年11月1日

微信扫码咨询专知VIP会员