SGD 对两层神经网络的全局收敛 (Global Convergence of SGD On Two Layer Neural Nets) - 专知论文

会员服务 ·

0

SGD · 全局收敛 · 激活函数 · 正则化 · 神经网络 ·

2023 年 4 月 8 日

Global Convergence of SGD On Two Layer Neural Nets

翻译：SGD 对两层神经网络的全局收敛

Pulkit Gopalani,Anirbit Mukherjee

from arxiv, 23 pages, 6 figures. Extended abstract accepted at DeepMath 2022. v2 update: New experiments added in Section 3.2 to study the effect of the regularization value. Statement of Theorem 3.4 about SoftPlus nets has been improved

In this note we demonstrate provable convergence of SGD to the global minima of appropriately regularized $\ell_2-$empirical risk of depth $2$ nets -- for arbitrary data and with any number of gates, if they are using adequately smooth and bounded activations like sigmoid and tanh. We build on the results in [1] and leverage a constant amount of Frobenius norm regularization on the weights, along with sampling of the initial weights from an appropriate distribution. We also give a continuous time SGD convergence result that also applies to smooth unbounded activations like SoftPlus. Our key idea is to show the existence loss functions on constant sized neural nets which are "Villani Functions". [1] Bin Shi, Weijie J. Su, and Michael I. Jordan. On learning rates and schr\"odinger operators, 2020. arXiv:2004.06977

翻译：在本文中，我们证明了SGD收敛到 depth $2$ 的神经网络的合适正则化后的 $\ell_2$ - 经验风险的全局最小值 - 对于任意数据和使用足够平滑和有界激活函数（如 sigmoid 和 tanh）的任意数量的门，都成立。我们建立在 [1] 中的结果之上，并在权重上施加一定量的 Frobenius 范数正则化，以及从适当的分布中采样初始权重。我们还给出了一个连续时间的 SGD 收敛结果，该结果也适用于平滑的无界激活函数（如 SoftPlus）。我们的关键思路是展示在具有恒定大小的神经网络上的损失函数是“Villani函数”。[1] Bin Shi，Weijie J. Su 和 Michael I. Jordan。 On learning rates and schr\"odinger operators, 2020. arXiv:2004.06977

0

相关内容

SGD

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【KDD2020】更深的图神经网络，Towards Deeper Graph Neural Networks

【KDD2020】更深的图神经网络，Towards Deeper Graph Neural Networks

专知会员服务

90+阅读 · 2020年7月22日

简明《神经网络数学》手册，16页pdf带你入门，Mathematics of Neural Networks

简明《神经网络数学》手册，16页pdf带你入门，Mathematics of Neural Networks

专知会员服务

68+阅读 · 2020年5月9日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

专知会员服务

14+阅读 · 2020年1月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

神经网络高斯过程 (Neural Network Gaussian Process)

神经网络高斯过程 (Neural Network Gaussian Process)

PaperWeekly

0+阅读 · 2022年11月8日

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

0+阅读 · 2022年6月10日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

树上生灭过程收敛速度及p-Laplacian特征值估计

国家自然科学基金

0+阅读 · 2015年12月31日

函数空间中关于积分算子的Wiener引理及有界性的研究

国家自然科学基金

1+阅读 · 2014年12月31日

有理映射的参数空间

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

神经网络随机学习算法的泛化性研究

国家自然科学基金

2+阅读 · 2013年12月31日

渐近锥流形上色散方程的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Cocycle动力学和拟周期薛定谔算子的谱

国家自然科学基金

0+阅读 · 2012年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

多元逼近的贪婪算法与量子算法

国家自然科学基金

0+阅读 · 2009年12月31日

Matrix Quantile Factor Model

Arxiv

0+阅读 · 2023年5月26日

The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks

Arxiv

0+阅读 · 2023年5月25日

A Guide Through the Zoo of Biased SGD

Arxiv

0+阅读 · 2023年5月25日

An Analysis of Quantile Temporal-Difference Learning

Arxiv

0+阅读 · 2023年5月25日

Non-adversarial training of Neural SDEs with signature kernel scores

Arxiv

0+阅读 · 2023年5月25日

Utility-Probability Duality of Neural Networks

Arxiv

0+阅读 · 2023年5月25日

Operator learning with PCA-Net: upper and lower complexity bounds

Arxiv

0+阅读 · 2023年5月24日

Optimal Rates for Bandit Nonstochastic Control

Arxiv

0+阅读 · 2023年5月24日

Low-Variance Forward Gradients using Direct Feedback Alignment and Momentum

Arxiv

0+阅读 · 2023年5月24日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【KDD2020】更深的图神经网络，Towards Deeper Graph Neural Networks

【KDD2020】更深的图神经网络，Towards Deeper Graph Neural Networks

专知会员服务

90+阅读 · 2020年7月22日

简明《神经网络数学》手册，16页pdf带你入门，Mathematics of Neural Networks

简明《神经网络数学》手册，16页pdf带你入门，Mathematics of Neural Networks

专知会员服务

68+阅读 · 2020年5月9日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

专知会员服务

14+阅读 · 2020年1月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国海军陆战队软件定义网络应用案例：分布式防火墙自动化系统》148页

《多体环境下定位导航授时（PNT）系统研究》228页

软件定义无线电（SDR）：商业与军事领域的技术、应用及未来趋势

《攻势防空作战中无人追击者/规避者最优轨迹研究（含动态交战区建模）》95页

相关资讯

神经网络高斯过程 (Neural Network Gaussian Process)

神经网络高斯过程 (Neural Network Gaussian Process)

PaperWeekly

0+阅读 · 2022年11月8日

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

0+阅读 · 2022年6月10日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Matrix Quantile Factor Model

Arxiv

0+阅读 · 2023年5月26日

The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks

Arxiv

0+阅读 · 2023年5月25日

A Guide Through the Zoo of Biased SGD

Arxiv

0+阅读 · 2023年5月25日

An Analysis of Quantile Temporal-Difference Learning

Arxiv

0+阅读 · 2023年5月25日

Non-adversarial training of Neural SDEs with signature kernel scores

Arxiv

0+阅读 · 2023年5月25日

Utility-Probability Duality of Neural Networks

Arxiv

0+阅读 · 2023年5月25日

Operator learning with PCA-Net: upper and lower complexity bounds

Arxiv

0+阅读 · 2023年5月24日

Optimal Rates for Bandit Nonstochastic Control

Arxiv

0+阅读 · 2023年5月24日

Low-Variance Forward Gradients using Direct Feedback Alignment and Momentum

Arxiv

0+阅读 · 2023年5月24日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

相关基金

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

树上生灭过程收敛速度及p-Laplacian特征值估计

国家自然科学基金

0+阅读 · 2015年12月31日

函数空间中关于积分算子的Wiener引理及有界性的研究

国家自然科学基金

1+阅读 · 2014年12月31日

有理映射的参数空间

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

神经网络随机学习算法的泛化性研究

国家自然科学基金

2+阅读 · 2013年12月31日

渐近锥流形上色散方程的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Cocycle动力学和拟周期薛定谔算子的谱

国家自然科学基金

0+阅读 · 2012年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

多元逼近的贪婪算法与量子算法

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员