关于全球渐变源全球趋同的证据,以证明与线性宽度的深RELU网络 (On the Proof of Global Convergence of Gradient Descent for Deep ReLU Networks with Linear Widths) - 专知论文

会员服务 ·

0

ReLU · Networking · 宽度 · 线性的 · 平方损失 ·

2021 年 1 月 24 日

On the Proof of Global Convergence of Gradient Descent for Deep ReLU Networks with Linear Widths

翻译：关于全球渐变源全球趋同的证据,以证明与线性宽度的深RELU网络

This paper studies the global convergence of gradient descent for deep ReLU networks under the square loss. For this setting, the current state-of-the-art results show that gradient descent converges to a global optimum if the widths of all the hidden layers scale at least as $\Omega(N^8)$ ($N$ being the number of training samples). In this paper, we discuss a simple proof framework which allows us to improve the existing over-parameterization condition to linear, quadratic and cubic widths (depending on the type of initialization scheme and/or the depth of the network).

翻译：本文研究了深RELU网络在平方损失之下的梯度下降值在全球的趋同性。对于这一背景,目前最新的结果显示,如果所有隐藏层规模的宽度至少为$\Omega(N_8)美元(培训样本数为N$美元),则梯度下降值会达到全球最佳水平。在本文中,我们讨论了一个简单的证明框架,使我们能够将现有的超分度条件提高到线性、二次和三次宽度(取决于初始化计划的类型和/或网络深度)。

0

相关内容

ReLU

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

专知会员服务

24+阅读 · 2020年4月13日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【WSDM 2020】RecVAE:一种新的变分自编码器，用于具有隐式反馈的Top-N推荐（RecVAE: a New Variational Autoencoder for Top-NRecommendations with Implicit Feedback）

【WSDM 2020】RecVAE:一种新的变分自编码器，用于具有隐式反馈的Top-N推荐（RecVAE: a New Variational Autoencoder for Top-NRecommendations with Implicit Feedback）

专知会员服务

32+阅读 · 2019年12月26日

【斯坦福大学】TASO:基于深度学习优化的自动生成图变换（TASO: Optimizing Deep Learning with Automatic Generation of Graph Substitutions），35页ppt

【斯坦福大学】TASO:基于深度学习优化的自动生成图变换（TASO: Optimizing Deep Learning with Automatic Generation of Graph Substitutions），35页ppt

专知会员服务

10+阅读 · 2019年12月22日

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

专知会员服务

21+阅读 · 2019年11月11日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇推荐系统相关论文—影响兴趣、知识Embeddings、音乐推荐、非结构化、一致性、显式和隐式特征、知识图谱

【论文推荐】最新七篇推荐系统相关论文—影响兴趣、知识Embeddings、音乐推荐、非结构化、一致性、显式和隐式特征、知识图谱

专知

14+阅读 · 2018年3月28日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

10分钟搞懂常用激活函数：linear、Sigmoid、tanh、Relu | Neural Networks #2

10分钟搞懂常用激活函数：linear、Sigmoid、tanh、Relu | Neural Networks #2

AI研习社

6+阅读 · 2017年12月13日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Optimal Algorithms for Convex Nested Stochastic Composite Optimization

Arxiv

0+阅读 · 2021年3月18日

Deep Equals Shallow for ReLU Networks in Kernel Regimes

Arxiv

0+阅读 · 2021年3月17日

Uniform Asymptotics and Confidence Regions Based on the Adaptive Lasso with Partially Consistent Tuning

Arxiv

0+阅读 · 2021年3月17日

Meta-learning with negative learning rates

Meta-learning with negative learning rates

Arxiv

2+阅读 · 2021年3月17日

Multi-Level Local SGD for Heterogeneous Hierarchical Networks

Arxiv

0+阅读 · 2021年3月16日

Private Weighted Random Walk Stochastic Gradient Descent

Arxiv

0+阅读 · 2021年3月16日

Convergence and sample complexity of gradient methods for the model-free linear quadratic regulator problem

Arxiv

0+阅读 · 2021年3月15日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Neural Ordinary Differential Equations

Arxiv

6+阅读 · 2018年10月3日

VIP会员

文章信息

相关主题

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

专知会员服务

24+阅读 · 2020年4月13日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【WSDM 2020】RecVAE:一种新的变分自编码器，用于具有隐式反馈的Top-N推荐（RecVAE: a New Variational Autoencoder for Top-NRecommendations with Implicit Feedback）

【WSDM 2020】RecVAE:一种新的变分自编码器，用于具有隐式反馈的Top-N推荐（RecVAE: a New Variational Autoencoder for Top-NRecommendations with Implicit Feedback）

专知会员服务

32+阅读 · 2019年12月26日

【斯坦福大学】TASO:基于深度学习优化的自动生成图变换（TASO: Optimizing Deep Learning with Automatic Generation of Graph Substitutions），35页ppt

【斯坦福大学】TASO:基于深度学习优化的自动生成图变换（TASO: Optimizing Deep Learning with Automatic Generation of Graph Substitutions），35页ppt

专知会员服务

10+阅读 · 2019年12月22日

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

专知会员服务

21+阅读 · 2019年11月11日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇推荐系统相关论文—影响兴趣、知识Embeddings、音乐推荐、非结构化、一致性、显式和隐式特征、知识图谱

【论文推荐】最新七篇推荐系统相关论文—影响兴趣、知识Embeddings、音乐推荐、非结构化、一致性、显式和隐式特征、知识图谱

专知

14+阅读 · 2018年3月28日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

10分钟搞懂常用激活函数：linear、Sigmoid、tanh、Relu | Neural Networks #2

10分钟搞懂常用激活函数：linear、Sigmoid、tanh、Relu | Neural Networks #2

AI研习社

6+阅读 · 2017年12月13日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Optimal Algorithms for Convex Nested Stochastic Composite Optimization

Arxiv

0+阅读 · 2021年3月18日

Deep Equals Shallow for ReLU Networks in Kernel Regimes

Arxiv

0+阅读 · 2021年3月17日

Uniform Asymptotics and Confidence Regions Based on the Adaptive Lasso with Partially Consistent Tuning

Arxiv

0+阅读 · 2021年3月17日

Meta-learning with negative learning rates

Meta-learning with negative learning rates

Arxiv

2+阅读 · 2021年3月17日

Multi-Level Local SGD for Heterogeneous Hierarchical Networks

Arxiv

0+阅读 · 2021年3月16日

Private Weighted Random Walk Stochastic Gradient Descent

Arxiv

0+阅读 · 2021年3月16日

Convergence and sample complexity of gradient methods for the model-free linear quadratic regulator problem

Arxiv

0+阅读 · 2021年3月15日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Neural Ordinary Differential Equations

Arxiv

6+阅读 · 2018年10月3日

微信扫码咨询专知VIP会员