最低重量规范模式并不总是能够很好地概括处理过度参数化问题 (Minimum weight norm models do not always generalize well for over-parameterized problems) - 专知论文

会员服务 ·

0

Weight · 极小点 · 泛化理论 · tuning · MoDELS ·

2020 年 12 月 1 日

Minimum weight norm models do not always generalize well for over-parameterized problems

翻译：最低重量规范模式并不总是能够很好地概括处理过度参数化问题

Vatsal Shah,Anastasios Kyrillidis,Sujay Sanghavi

from arxiv, This work is substituted by the paper in arXiv:2011.14066

This work is substituted by the paper in arXiv:2011.14066. Stochastic gradient descent is the de facto algorithm for training deep neural networks (DNNs). Despite its popularity, it still requires fine tuning in order to achieve its best performance. This has led to the development of adaptive methods, that claim automatic hyper-parameter optimization. Recently, researchers have studied both algorithmic classes via toy examples: e.g., for over-parameterized linear regression, Wilson et. al. (2017) shows that, while SGD always converges to the minimum-norm solution, adaptive methods show no such inclination, leading to worse generalization capabilities. Our aim is to study this conjecture further. We empirically show that the minimum weight norm is not necessarily the proper gauge of good generalization in simplified scenaria, and different models found by adaptive methods could outperform plain gradient methods. In practical DNN settings, we observe that adaptive methods can outperform SGD, with larger weight norm output models, but without necessarily reducing the amount of tuning required.

翻译：本文在arXiv:2011.14066中用论文取代了这项工作。软性梯度下降是培训深神经网络(DNNs)的实际算法。尽管它很受欢迎,但仍然需要微调才能达到最佳性能。这导致了适应方法的发展,要求自动超参数优化。最近,研究人员通过玩具的例子研究了两种算法类别:例如,对于过于分化的线性回归,Wilson等人(2017年)表明,尽管SGD总是会接近最低温度的解决方案,但适应方法没有表现出这种倾向,导致更差的概括性能力。我们的目标是进一步研究这一预测。我们从经验上表明,最低重量标准不一定是简化的系统内良好一般化的适当衡量标准,而适应方法发现的不同模型可能比普通的梯度方法要差。在实用的DNNN环境中,我们发现适应方法可以比SGD(SD)更重的标准输出模型更强,但不一定减少所需的调量。

0

相关内容

Weight

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

126+阅读 · 2020年11月20日

【NeurIPS 2020】图神经网络的参数化解释器，Parameterized Explainer for GNN

【NeurIPS 2020】图神经网络的参数化解释器，Parameterized Explainer for GNN

专知会员服务

21+阅读 · 2020年11月13日

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

【经典书】贝叶斯编程，378页pdf，Bayesian Programming

【经典书】贝叶斯编程，378页pdf，Bayesian Programming

专知会员服务

249+阅读 · 2020年5月18日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

48+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

195+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

26+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

已删除

将门创投

10+阅读 · 2019年3月6日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

A hierarchical expected improvement method for Bayesian optimization

Arxiv

0+阅读 · 2021年1月14日

A Pragmatic Approach for Hyper-Parameter Tuning in Search-based Test Case Generation

A Pragmatic Approach for Hyper-Parameter Tuning in Search-based Test Case Generation

Arxiv

0+阅读 · 2021年1月14日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

Where to put the Image in an Image Caption Generator

Arxiv

3+阅读 · 2018年3月14日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

126+阅读 · 2020年11月20日

【NeurIPS 2020】图神经网络的参数化解释器，Parameterized Explainer for GNN

【NeurIPS 2020】图神经网络的参数化解释器，Parameterized Explainer for GNN

专知会员服务

21+阅读 · 2020年11月13日

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

【经典书】贝叶斯编程，378页pdf，Bayesian Programming

【经典书】贝叶斯编程，378页pdf，Bayesian Programming

专知会员服务

249+阅读 · 2020年5月18日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

48+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

195+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《面向战场物联网（IoBT）的移动雾人工智能》

《无人机航电系统结构、分类和集成的进展：全面回顾与未来展望》

物联网在国防中的应用：物联网如何改变国防系统和军事行动

《特朗普政府 2.0：观点、人员和政策方向》

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

26+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

已删除

将门创投

10+阅读 · 2019年3月6日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

A hierarchical expected improvement method for Bayesian optimization

Arxiv

0+阅读 · 2021年1月14日

A Pragmatic Approach for Hyper-Parameter Tuning in Search-based Test Case Generation

A Pragmatic Approach for Hyper-Parameter Tuning in Search-based Test Case Generation

Arxiv

0+阅读 · 2021年1月14日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

Where to put the Image in an Image Caption Generator

Arxiv

3+阅读 · 2018年3月14日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员