SGD在国际刑网化制度中为最不发达地区最后的迭代趋同 (Last iterate convergence of SGD for Least-Squares in the Interpolation regime) - 专知论文

会员服务 ·

0

SGD · 再生核希尔伯特空间 · 预测器/决策函数 · 随机梯度下降 · 估计/估计量 ·

2021 年 6 月 2 日

Last iterate convergence of SGD for Least-Squares in the Interpolation regime

翻译：SGD在国际刑网化制度中为最不发达地区最后的迭代趋同

Aditya Varre,Loucas Pillaud-Vivien,Nicolas Flammarion

from arxiv, 23 pages, 1 figure, 1 Appendix

Motivated by the recent successes of neural networks that have the ability to fit the data perfectly and generalize well, we study the noiseless model in the fundamental least-squares setup. We assume that an optimum predictor fits perfectly inputs and outputs $\langle \theta_* , \phi(X) \rangle = Y$, where $\phi(X)$ stands for a possibly infinite dimensional non-linear feature map. To solve this problem, we consider the estimator given by the last iterate of stochastic gradient descent (SGD) with constant step-size. In this context, our contribution is two fold: (i) from a (stochastic) optimization perspective, we exhibit an archetypal problem where we can show explicitly the convergence of SGD final iterate for a non-strongly convex problem with constant step-size whereas usual results use some form of average and (ii) from a statistical perspective, we give explicit non-asymptotic convergence rates in the over-parameterized setting and leverage a fine-grained parameterization of the problem to exhibit polynomial rates that can be faster than $O(1/T)$. The link with reproducing kernel Hilbert spaces is established.

翻译：受最近能够完美和全面地匹配数据的神经网络的成功激励,我们研究了基本最小平方结构中的无噪音模型。我们假设一个最佳预测器完全适合输入和输出$\langle\theta ⁇,\phi(X)\rangle=Y$,$\phi(X)\rangle=Y$,其中$\phi(X)$代表一个可能无限的维度非线性地貌图。为了解决这个问题,我们从统计角度来考虑最后一个迭代的随机梯度梯度下降(SGD)给出的测量器。在这方面,我们的贡献是两个折叠:(i) 从(stochetic)优化的角度,我们展示了一个拱形问题,我们可以明确显示SGD最终的螺旋值与非强性渐变型螺旋问题趋同,而通常的结果则使用某种平均和(ii)的形式。我们从统计角度,我们给出了在过度校准定的定基底基底基底基底基底定和杠杆化的摩擦趋近率。我们把SLI1/Hilneteltal-rocalimalimatealizedal-latexegelationald the the

0

相关内容

SGD

【TPAMI2021】鲁棒可微SVD，Robust Differentiable SVD

专知会员服务

23+阅读 · 2021年4月10日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Consensus-Based Optimization on the Sphere: Convergence to Global Minimizers and Machine Learning

Arxiv

0+阅读 · 2021年7月28日

On the Role of Optimization in Double Descent: A Least Squares Study

Arxiv

0+阅读 · 2021年7月27日

Minimizing Nonsmooth Convex Functions with Variable Accuracy

Arxiv

0+阅读 · 2021年7月26日

Implicit bias of gradient descent for mean squared error regression with wide neural networks

Arxiv

0+阅读 · 2021年7月25日

Low-bandwidth recovery of linear functions of Reed-Solomon-encoded data

Arxiv

0+阅读 · 2021年7月25日

Coverage Error Optimal Confidence Intervals for Local Polynomial Regression

Arxiv

0+阅读 · 2021年7月23日

Empirical Risk Minimization in the Interpolating Regime with Application to Neural Network Learning

Arxiv

0+阅读 · 2021年7月23日

A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-$\ell_1$-Norm Interpolated Classifiers

Arxiv

0+阅读 · 2021年7月22日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

VIP会员

文章信息

相关主题

再生核希尔伯特空间

预测器/决策函数

随机梯度下降

估计/估计量

相关VIP内容

【TPAMI2021】鲁棒可微SVD，Robust Differentiable SVD

专知会员服务

23+阅读 · 2021年4月10日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

智能体化人工智能：架构、应用及未来发展方向的综合综述

《自主武器》365页书籍

联邦学习综述：多层次聚合技术的系统分类、实验洞察与未来前沿

人工智能在空战中的局限及其真正适用领域

相关资讯

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Consensus-Based Optimization on the Sphere: Convergence to Global Minimizers and Machine Learning

Arxiv

0+阅读 · 2021年7月28日

On the Role of Optimization in Double Descent: A Least Squares Study

Arxiv

0+阅读 · 2021年7月27日

Minimizing Nonsmooth Convex Functions with Variable Accuracy

Arxiv

0+阅读 · 2021年7月26日

Implicit bias of gradient descent for mean squared error regression with wide neural networks

Arxiv

0+阅读 · 2021年7月25日

Low-bandwidth recovery of linear functions of Reed-Solomon-encoded data

Arxiv

0+阅读 · 2021年7月25日

Coverage Error Optimal Confidence Intervals for Local Polynomial Regression

Arxiv

0+阅读 · 2021年7月23日

Empirical Risk Minimization in the Interpolating Regime with Application to Neural Network Learning

Arxiv

0+阅读 · 2021年7月23日

A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-$\ell_1$-Norm Interpolated Classifiers

Arxiv

0+阅读 · 2021年7月22日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

微信扫码咨询专知VIP会员