两层神经网络中的过度平衡化 (Harmless Overparametrization in Two-layer Neural Networks) - 专知论文

会员服务 ·

0

泛化理论 · Neural Networks · 估计/估计量 · 正则化项 · Networking ·

2021 年 6 月 9 日

Harmless Overparametrization in Two-layer Neural Networks

翻译：两层神经网络中的过度平衡化

Huiyuan Wang,Wei Lin

Overparametrized neural networks, where the number of active parameters is larger than the sample size, prove remarkably effective in modern deep learning practice. From the classical perspective, however, much fewer parameters are sufficient for optimal estimation and prediction, whereas overparametrization can be harmful even in the presence of explicit regularization. To reconcile this conflict, we present a generalization theory for overparametrized ReLU networks by incorporating an explicit regularizer based on the scaled variation norm. Interestingly, this regularizer is equivalent to the ridge from the angle of gradient-based optimization, but is similar to the group lasso in terms of controlling model complexity. By exploiting this ridge-lasso duality, we show that overparametrization is generally harmless to two-layer ReLU networks. In particular, the overparametrized estimators are minimax optimal up to a logarithmic factor. By contrast, we show that overparametrized random feature models suffer from the curse of dimensionality and thus are suboptimal.

翻译：过度平衡的神经网络,其主动参数数量大于抽样规模,在现代深层学习实践中证明非常有效。但是,从古典角度看,参数要少得多,就足以进行最佳估计和预测,而过度平衡即使存在明确的正规化,也可能有害。为了调和这一冲突,我们提出了一个过度平衡的ReLU网络的概括理论,根据比例变异规范引入一个明确的常规化器。有趣的是,这一常规化器相当于从梯度优化角度出发的脊脊,但在控制模型复杂性方面与群状正方形相似。通过利用这种脊脊-弧索的双重性,我们表明过度平衡通常对双层的ReLU网络无害。特别是,过度平衡的估量是最小的负负值,与逻辑性系数相匹配。相比之下,我们发现过度匹配的随机特征模型受到维度的诅咒,因此不那么完美。

0

相关内容

泛化理论

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【图神经网络遇上符号计算】Graph Neural Networks Meet Neural-Symbolic Computing: A Survey and Perspective

【图神经网络遇上符号计算】Graph Neural Networks Meet Neural-Symbolic Computing: A Survey and Perspective

专知会员服务

44+阅读 · 2020年3月3日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Graph Neural Network（GNN）最全资源整理分享

Graph Neural Network（GNN）最全资源整理分享

深度学习与NLP

339+阅读 · 2019年7月9日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Semiparametric Functional Factor Models with Bayesian Rank Selection

Arxiv

0+阅读 · 2021年8月4日

Limit Distribution Theory for the Smooth 1-Wasserstein Distance with Applications

Arxiv

0+阅读 · 2021年8月4日

Sparse Continuous Distributions and Fenchel-Young Losses

Arxiv

0+阅读 · 2021年8月4日

Improved Non-parametric Penalized Maximum Likelihood Estimation for Arbitrarily Censored Survival Data

Arxiv

0+阅读 · 2021年8月4日

Geometry of Linear Convolutional Networks

Arxiv

0+阅读 · 2021年8月3日

Deep Quantile Regression: Mitigating the Curse of Dimensionality Through Composition

Arxiv

0+阅读 · 2021年8月1日

Functional estimation in log-concave location families

Arxiv

0+阅读 · 2021年7月31日

The Separation Capacity of Random Neural Networks

Arxiv

0+阅读 · 2021年7月31日

Mittag-Leffler functions and their applications in network science

Arxiv

0+阅读 · 2021年7月30日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

VIP会员

文章信息

相关主题

Neural Networks

估计/估计量

相关VIP内容

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【图神经网络遇上符号计算】Graph Neural Networks Meet Neural-Symbolic Computing: A Survey and Perspective

【图神经网络遇上符号计算】Graph Neural Networks Meet Neural-Symbolic Computing: A Survey and Perspective

专知会员服务

44+阅读 · 2020年3月3日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】面向企业的图学习扩展：生产级图学习与推理，485页pdf

AI智能体编程：技术、挑战与机遇综述

【国家标准】数据安全技术数据安全风险评估方法

【CMU博士论文】交互式学习的进展：替代性反馈机制与自适应因果推理

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Graph Neural Network（GNN）最全资源整理分享

Graph Neural Network（GNN）最全资源整理分享

深度学习与NLP

339+阅读 · 2019年7月9日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Semiparametric Functional Factor Models with Bayesian Rank Selection

Arxiv

0+阅读 · 2021年8月4日

Limit Distribution Theory for the Smooth 1-Wasserstein Distance with Applications

Arxiv

0+阅读 · 2021年8月4日

Sparse Continuous Distributions and Fenchel-Young Losses

Arxiv

0+阅读 · 2021年8月4日

Improved Non-parametric Penalized Maximum Likelihood Estimation for Arbitrarily Censored Survival Data

Arxiv

0+阅读 · 2021年8月4日

Geometry of Linear Convolutional Networks

Arxiv

0+阅读 · 2021年8月3日

Deep Quantile Regression: Mitigating the Curse of Dimensionality Through Composition

Arxiv

0+阅读 · 2021年8月1日

Functional estimation in log-concave location families

Arxiv

0+阅读 · 2021年7月31日

The Separation Capacity of Random Neural Networks

Arxiv

0+阅读 · 2021年7月31日

Mittag-Leffler functions and their applications in network science

Arxiv

0+阅读 · 2021年7月30日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

微信扫码咨询专知VIP会员