了解平整层的普遍化效益:减少创伤 (Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction) - 专知论文

会员服务 ·

0

规范化的 · 泛化理论 · 通用动力公司 · 规范化 · 可理解性 ·

2023 年 1 月 17 日

Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction

翻译：了解平整层的普遍化效益:减少创伤

Kaifeng Lyu,Zhiyuan Li,Sanjeev Arora

from arxiv, 76 pages, many figures; NeurIPS 2022 camera-ready version; fixes minor typos

Normalization layers (e.g., Batch Normalization, Layer Normalization) were introduced to help with optimization difficulties in very deep nets, but they clearly also help generalization, even in not-so-deep nets. Motivated by the long-held belief that flatter minima lead to better generalization, this paper gives mathematical analysis and supporting experiments suggesting that normalization (together with accompanying weight-decay) encourages GD to reduce the sharpness of loss surface. Here "sharpness" is carefully defined given that the loss is scale-invariant, a known consequence of normalization. Specifically, for a fairly broad class of neural nets with normalization, our theory explains how GD with a finite learning rate enters the so-called Edge of Stability (EoS) regime, and characterizes the trajectory of GD in this regime via a continuous sharpness-reduction flow.

翻译：普通化层(例如批量正常化、层级正常化)的引入有助于在非常深的网中解决优化困难,但显然也有助于普及化,即使在不那么深的网中也是如此。长期的观念认为,平面小型网能带来更好的概括化。本文提供了数学分析和支持实验,表明正常化(连同随附的重量下降)能鼓励GD降低损失表面的锐度。这里“精度”是谨慎定义的,因为损失是规模变化性的,是正常化的一个已知后果。具体地说,对于相当广泛的神经网类别,我们理论解释了有一定学习率的GD如何进入所谓的“稳定边缘”制度,并且通过持续的锐度降低流动来描述GD在这个制度中的轨迹。

0

相关内容

规范化的

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

69+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

50+阅读 · 2020年12月14日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

90+阅读 · 2020年4月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

12+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

77+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

98+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

14+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

1+阅读 · 2017年12月31日

Heisenberg 群上的 k-平面变换

国家自然科学基金

0+阅读 · 2015年12月31日

NOX2- ROS-线粒体在高血糖增加局麻药周围神经毒性中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

可压缩Navier-Stokes方程和Boltzmann方程解的渐近行为

国家自然科学基金

0+阅读 · 2013年12月31日

p53负调控分子WIP1在脑缺血损伤中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

蓝绿菌诱导碳酸盐矿物形成机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

多重代价失衡的机器学习技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

量子关联及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

组合导航系统中基于混沌、小波和神经网络的信息融合方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models

Arxiv

0+阅读 · 2023年3月10日

Generalised bayesian sample copula of order $m$

Arxiv

0+阅读 · 2023年3月9日

The joint node degree distribution in the Erdős-Rényi network

Arxiv

0+阅读 · 2023年3月9日

On the Robustness of Dataset Inference

Arxiv

0+阅读 · 2023年3月9日

Optimizing Utility-Energy Efficiency for the Metaverse over Wireless Networks under Physical Layer Security

Arxiv

0+阅读 · 2023年3月8日

Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent

Arxiv

0+阅读 · 2023年3月7日

On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models

Arxiv

0+阅读 · 2023年3月7日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

14+阅读 · 2020年2月25日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Additive Margin Softmax for Face Verification

Arxiv

11+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

通用动力公司

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

69+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

50+阅读 · 2020年12月14日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

90+阅读 · 2020年4月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

12+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

77+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

98+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

热门VIP内容

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

14+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

相关论文

SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models

Arxiv

0+阅读 · 2023年3月10日

Generalised bayesian sample copula of order $m$

Arxiv

0+阅读 · 2023年3月9日

The joint node degree distribution in the Erdős-Rényi network

Arxiv

0+阅读 · 2023年3月9日

On the Robustness of Dataset Inference

Arxiv

0+阅读 · 2023年3月9日

Optimizing Utility-Energy Efficiency for the Metaverse over Wireless Networks under Physical Layer Security

Arxiv

0+阅读 · 2023年3月8日

Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent

Arxiv

0+阅读 · 2023年3月7日

On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models

Arxiv

0+阅读 · 2023年3月7日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

14+阅读 · 2020年2月25日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Additive Margin Softmax for Face Verification

Arxiv

11+阅读 · 2018年1月18日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

1+阅读 · 2017年12月31日

Heisenberg 群上的 k-平面变换

国家自然科学基金

0+阅读 · 2015年12月31日

NOX2- ROS-线粒体在高血糖增加局麻药周围神经毒性中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

可压缩Navier-Stokes方程和Boltzmann方程解的渐近行为

国家自然科学基金

0+阅读 · 2013年12月31日

p53负调控分子WIP1在脑缺血损伤中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

蓝绿菌诱导碳酸盐矿物形成机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

多重代价失衡的机器学习技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

量子关联及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

组合导航系统中基于混沌、小波和神经网络的信息融合方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员