重新思考神经网络普遍化的双向-差异权衡取舍 (Rethinking Bias-Variance Trade-off for Generalization of Neural Networks) - 专知论文

会员服务 ·

0

方差 · 有偏 · Networking · Neural Networks · 单峰值 ·

2020 年 12 月 8 日

Rethinking Bias-Variance Trade-off for Generalization of Neural Networks

翻译：重新思考神经网络普遍化的双向-差异权衡取舍

Zitong Yang,Yaodong Yu,Chong You,Jacob Steinhardt,Yi Ma

The classical bias-variance trade-off predicts that bias decreases and variance increase with model complexity, leading to a U-shaped risk curve. Recent work calls this into question for neural networks and other over-parameterized models, for which it is often observed that larger models generalize better. We provide a simple explanation for this by measuring the bias and variance of neural networks: while the bias is monotonically decreasing as in the classical theory, the variance is unimodal or bell-shaped: it increases then decreases with the width of the network. We vary the network architecture, loss function, and choice of dataset and confirm that variance unimodality occurs robustly for all models we considered. The risk curve is the sum of the bias and variance curves and displays different qualitative shapes depending on the relative scale of bias and variance, with the double descent curve observed in recent literature as a special case. We corroborate these empirical results with a theoretical analysis of two-layer linear networks with random first layer. Finally, evaluation on out-of-distribution data shows that most of the drop in accuracy comes from increased bias while variance increases by a relatively small amount. Moreover, we find that deeper models decrease bias and increase variance for both in-distribution and out-of-distribution data.

翻译：典型的偏差权衡法预测,偏差会随着模型复杂度而减少和差异增加,从而导致U形风险曲线。最近的工作使得神经网络和其他超度参数模型对此产生疑问,人们经常看到,较大的模型会更概括化。我们通过测量神经网络的偏差和差异来简单解释这一点:虽然偏差与古典理论一样单向缩小,但差异是单向的或钟形的:随着网络宽度的扩大而增加。我们改变网络结构、损失功能和数据集的选择,并证实我们所考虑的所有模型都出现强烈的偏差和差异。风险曲线是偏差和差异曲线的总和,根据偏差和差异的相对规模显示不同的质量形状,最近文献中观察到的双向下降曲线是一个特殊案例。我们用对一级随机的两层线性网络的理论分析来证实这些经验性结果。最后,对分配外数据的评估表明,大多数准确性下降是由于偏差的增加,而差异则因相对小的偏差程度而增加。此外,我们发现,更深的模型会减少,数据会缩小。

0

相关内容

AAAI2021 | 图神经网络的异质图结构学习，Heterogeneous Graph Structure Learning for Graph Neural Networks

专知会员服务

92+阅读 · 2021年1月20日

【清华大学】图随机神经网络，Graph Random Neural Networks

【清华大学】图随机神经网络，Graph Random Neural Networks

专知会员服务

156+阅读 · 2020年5月26日

【论文推荐】二值神经网络综述，Binary Neural Networks: A Survey

【论文推荐】二值神经网络综述，Binary Neural Networks: A Survey

专知会员服务

53+阅读 · 2020年4月8日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

专知会员服务

34+阅读 · 2020年2月27日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

BAT机器学习面试题1000题（316~320题）

BAT机器学习面试题1000题（316~320题）

七月在线实验室

14+阅读 · 2018年1月18日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Improving the convergence of SGD through adaptive batch sizes

Arxiv

0+阅读 · 2021年2月9日

Consensus Control for Decentralized Deep Learning

Arxiv

0+阅读 · 2021年2月9日

Stability and Generalization of the Decentralized Stochastic Gradient Descent

Arxiv

0+阅读 · 2021年2月9日

What causes the test error? Going beyond bias-variance via ANOVA

Arxiv

0+阅读 · 2021年2月8日

Interval Universal Approximation for Neural Networks

Arxiv

0+阅读 · 2021年2月8日

Domain Adversarial Neural Networks for Domain Generalization: When It Works and How to Improve

Arxiv

0+阅读 · 2021年2月7日

Benign Overfitting and Noisy Features

Arxiv

0+阅读 · 2021年2月4日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks

Arxiv

6+阅读 · 2020年6月15日

Training Generative Adversarial Networks Via Turing Test

Training Generative Adversarial Networks Via Turing Test

Arxiv

3+阅读 · 2018年10月25日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

AAAI2021 | 图神经网络的异质图结构学习，Heterogeneous Graph Structure Learning for Graph Neural Networks

专知会员服务

92+阅读 · 2021年1月20日

【清华大学】图随机神经网络，Graph Random Neural Networks

【清华大学】图随机神经网络，Graph Random Neural Networks

专知会员服务

156+阅读 · 2020年5月26日

【论文推荐】二值神经网络综述，Binary Neural Networks: A Survey

【论文推荐】二值神经网络综述，Binary Neural Networks: A Survey

专知会员服务

53+阅读 · 2020年4月8日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

专知会员服务

34+阅读 · 2020年2月27日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《步兵小单元山地严寒作战指南》美军最新条令200页

《联合作战概念的发展》最新报告

俄制无人机弹药

《复杂场景下自主着陆的模型预测控制技术》92页

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

BAT机器学习面试题1000题（316~320题）

BAT机器学习面试题1000题（316~320题）

七月在线实验室

14+阅读 · 2018年1月18日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Improving the convergence of SGD through adaptive batch sizes

Arxiv

0+阅读 · 2021年2月9日

Consensus Control for Decentralized Deep Learning

Arxiv

0+阅读 · 2021年2月9日

Stability and Generalization of the Decentralized Stochastic Gradient Descent

Arxiv

0+阅读 · 2021年2月9日

What causes the test error? Going beyond bias-variance via ANOVA

Arxiv

0+阅读 · 2021年2月8日

Interval Universal Approximation for Neural Networks

Arxiv

0+阅读 · 2021年2月8日

Domain Adversarial Neural Networks for Domain Generalization: When It Works and How to Improve

Arxiv

0+阅读 · 2021年2月7日

Benign Overfitting and Noisy Features

Arxiv

0+阅读 · 2021年2月4日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks

Arxiv

6+阅读 · 2020年6月15日

Training Generative Adversarial Networks Via Turing Test

Training Generative Adversarial Networks Via Turing Test

Arxiv

3+阅读 · 2018年10月25日

微信扫码咨询专知VIP会员