深度ResNet的超度参数化:零损失和平均实地分析 (Overparameterization of deep ResNet: zero loss and mean-field analysis) - 专知论文

会员服务 ·

0

ResNet · PDE · 可约的 · Neural Networks · 优化器 ·

2021 年 5 月 30 日

Overparameterization of deep ResNet: zero loss and mean-field analysis

翻译：深度ResNet的超度参数化:零损失和平均实地分析

Zhiyan Ding,Shi Chen,Qin Li,Stephen Wright

Finding parameters in a deep neural network (NN) that fit training data is a nonconvex optimization problem, but a basic first-order optimization method (gradient descent) finds a global solution with perfect fit in many practical situations. We examine this phenomenon for the case of Residual Neural Networks (ResNet) with smooth activation functions in a limiting regime in which both the number of layers (depth) and the number of neurons in each layer (width) go to infinity. First, we use a mean-field-limit argument to prove that the gradient descent for parameter training becomes a partial differential equation (PDE) that characterizes gradient flow for a probability distribution in the large-NN limit. Next, we show that the solution to the PDE converges in the training time to a zero-loss solution. Together, these results imply that training of the ResNet also gives a near-zero loss if the Resnet is large enough. We give estimates of the depth and width needed to reduce the loss below a given threshold, with high probability.

翻译：深神经网络(NN)的查找参数显示,适合培训数据是一个非convex优化问题,但基本的第一阶优化方法(渐降法)发现一种完全适合许多实际情况的全球解决方案。我们在一个限制制度下为残余神经网络(ResNet)研究这种现象,该现象具有平稳的激活功能,该限制制度规定,每个层(深度)的数量和每个层(宽度)的神经元数量都具有无限性。首先,我们使用一个平均场限制参数培训的参数梯度下降为部分差异方程式(PDE),该方程式将梯度流定性为大-NNN极限的概率分布。接下来,我们显示PDE的解决方案在培训时间会汇合为零损失解决方案。这些结果共同表明,如果Resnet足够大,Resnet的培训也会造成近零损失。我们估计了将损失降到一个临界值以下所需的深度和宽度,概率很高。

0

相关内容

ResNet

训练深度学习模型，46页ppt

专知会员服务

54+阅读 · 2021年6月17日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

神经网络不work？看下这份《训练神经网络实用技巧》，3页pdf

专知会员服务

56+阅读 · 2020年12月29日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

读博意味着什么？如何趟过这个坑，这几页slides给你指南下

读博意味着什么？如何趟过这个坑，这几页slides给你指南下

专知会员服务

29+阅读 · 2020年11月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

已删除

将门创投

4+阅读 · 2018年7月31日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

【深度】Deep Visualization:可视化并理解CNN

【深度】Deep Visualization:可视化并理解CNN

专知

12+阅读 · 2017年9月30日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Differentiable Annealed Importance Sampling and the Perils of Gradient Noise

Arxiv

0+阅读 · 2021年7月21日

An Embedding of ReLU Networks and an Analysis of their Identifiability

An Embedding of ReLU Networks and an Analysis of their Identifiability

Arxiv

0+阅读 · 2021年7月20日

Generalized maximum likelihood estimation of the mean of parameters of mixtures, with applications to sampling

Arxiv

0+阅读 · 2021年7月20日

Rethinking the limiting dynamics of SGD: modified loss, phase space oscillations, and anomalous diffusion

Arxiv

0+阅读 · 2021年7月19日

Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification

Arxiv

0+阅读 · 2021年7月19日

Fast Margin Maximization via Dual Acceleration

Arxiv

4+阅读 · 2021年7月1日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Deep Convolutional Networks as shallow Gaussian Processes

Arxiv

4+阅读 · 2018年8月16日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

训练深度学习模型，46页ppt

专知会员服务

54+阅读 · 2021年6月17日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

神经网络不work？看下这份《训练神经网络实用技巧》，3页pdf

专知会员服务

56+阅读 · 2020年12月29日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

读博意味着什么？如何趟过这个坑，这几页slides给你指南下

读博意味着什么？如何趟过这个坑，这几页slides给你指南下

专知会员服务

29+阅读 · 2020年11月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

模型提取攻击与防御的系统综述：最新进展与展望

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

【CMU博士论文】用于物理模拟的高效深度学习模型

大模型解决方案白皮书：社交陪伴场景全流程落地指南

相关资讯

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

已删除

将门创投

4+阅读 · 2018年7月31日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

【深度】Deep Visualization:可视化并理解CNN

【深度】Deep Visualization:可视化并理解CNN

专知

12+阅读 · 2017年9月30日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Differentiable Annealed Importance Sampling and the Perils of Gradient Noise

Arxiv

0+阅读 · 2021年7月21日

An Embedding of ReLU Networks and an Analysis of their Identifiability

An Embedding of ReLU Networks and an Analysis of their Identifiability

Arxiv

0+阅读 · 2021年7月20日

Generalized maximum likelihood estimation of the mean of parameters of mixtures, with applications to sampling

Arxiv

0+阅读 · 2021年7月20日

Rethinking the limiting dynamics of SGD: modified loss, phase space oscillations, and anomalous diffusion

Arxiv

0+阅读 · 2021年7月19日

Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification

Arxiv

0+阅读 · 2021年7月19日

Fast Margin Maximization via Dual Acceleration

Arxiv

4+阅读 · 2021年7月1日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Deep Convolutional Networks as shallow Gaussian Processes

Arxiv

4+阅读 · 2018年8月16日

微信扫码咨询专知VIP会员