将SGD噪音在不同的深层学习制度下的影响分解 (Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning) - 专知论文

会员服务 ·

0

SGD · 噪声 · Networking · Performer · Learning ·

2023 年 1 月 31 日

Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning

翻译：将SGD噪音在不同的深层学习制度下的影响分解

Antonio Sclocchi,Mario Geiger,Matthieu Wyart

from arxiv, 18 pages, 14 figures

Understanding when the noise in stochastic gradient descent (SGD) affects generalization of deep neural networks remains a challenge, complicated by the fact that networks can operate in distinct training regimes. Here we study how the magnitude of this noise $T$ affects performance as the size of the training set $P$ and the scale of initialization $\alpha$ are varied. For gradient descent, $\alpha$ is a key parameter that controls if the network is `lazy' ($\alpha\gg 1$) or instead learns features ($\alpha\ll 1$). For classification of MNIST and CIFAR10 images, our central results are: (i) obtaining phase diagrams for performance in the $(\alpha,T)$ plane. They show that SGD noise can be detrimental or instead useful depending on the training regime. Moreover, although increasing $T$ or decreasing $\alpha$ both allow the net to escape the lazy regime, these changes can have opposite effects on performance. (ii) Most importantly, we find that key dynamical quantities (including the total variations of weights during training) depend on both $T$ and $P$ as power laws, and the characteristic temperature $T_c$, where the noise of SGD starts affecting performance, is a power law of $P$. These observations indicate that a key effect of SGD noise occurs late in training, by affecting the stopping process whereby all data are fitted. We argue that due to SGD noise, nets must develop a stronger `signal', i.e. larger informative weights, to fit the data, leading to a longer training time. The same effect occurs at larger training set $P$. We confirm this view in the perceptron model, where signal and noise can be precisely measured. Interestingly, exponents characterizing the effect of SGD depend on the density of data near the decision boundary, as we explain.

翻译：当深心梯度下降(SGD)的噪音影响深心神经网络的普及时,了解这种噪音的噪音仍然是一个挑战,由于网络可以在不同的培训制度下运作,因此情况变得复杂。这里我们研究的是,由于培训规定的规模($P美元)和初始化规模($alpha美元)各不相同,这种噪音的规模如何影响性能。对于梯度下降,$alpha$是一个关键参数,如果网络是“懒惰的”($alpha\gg 1美元),或者不是学习性能($alpha\gg 1美元),那么对于MNIST和CIFAR10图像的分类来说,我们的核心结果是:(一) 以美元(alpha,T) 美元($) 来获取阶段性能图图。它们表明,SGD的噪音噪音噪音会有害或有用。此外,虽然增加美元($)或减少美元($)的值,这些变化会影响到网络的性能作用, 最重要的是,我们发现关键的动态数量(包括培训期间的重量总重量变化) 以美元(美元) 数据在SGDGDDD 训练的特性上, 数据在Sdestrate State Stal 。

0

相关内容

SGD

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

252+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

GB-InSAR图像误差特征分析与改正模型研究

国家自然科学基金

0+阅读 · 2014年12月31日

具有沟槽-场限环复合终端双芯GCT的关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

斑块血流灌注时空异质性的高造影组织比超声造影成像

国家自然科学基金

0+阅读 · 2013年12月31日

1.94 um波段Tm:Ho共掺石英基全光纤飞秒脉冲激光技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

高铁钢轨表面缺陷的光声无损检测方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

碳纳米管负载的双金属纳米粒子复合材料的制备及应用

国家自然科学基金

0+阅读 · 2012年12月31日

载铂碳化钨蒙脱石复合材料微纳结构与电催化性能关联性研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

组合导航系统中基于混沌、小波和神经网络的信息融合方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

Lower Bound on the Bayesian Risk via Information Measure

Arxiv

0+阅读 · 2023年3月22日

Fighting over-fitting with quantization for learning deep neural networks on noisy labels

Arxiv

0+阅读 · 2023年3月21日

An Effective Multivariate Normality Test via Hessians of Empirical Cumulant Generating Functions

Arxiv

0+阅读 · 2023年3月20日

On lower bounds for the bias-variance trade-off

Arxiv

0+阅读 · 2023年3月20日

A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity

Arxiv

0+阅读 · 2023年3月19日

Lower Generalization Bounds for GD and SGD in Smooth Stochastic Convex Optimization

Arxiv

0+阅读 · 2023年3月19日

Estimating optimal treatment regimes in survival contexts using an instrumental variable

Arxiv

0+阅读 · 2023年3月18日

The Principles of Deep Learning Theory

Arxiv

66+阅读 · 2021年6月18日

The Modern Mathematics of Deep Learning

Arxiv

49+阅读 · 2021年5月9日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

252+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】在低维与高维空间中对潜在表征的分析、建模与变换

《美军使用大语言模型技术生成领域特定文档》2025最新379页

【NeurIPS 2025】以语言为中心的全模态表征学习的可扩展性研究

智能体化多模态大语言模型综述

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Lower Bound on the Bayesian Risk via Information Measure

Arxiv

0+阅读 · 2023年3月22日

Fighting over-fitting with quantization for learning deep neural networks on noisy labels

Arxiv

0+阅读 · 2023年3月21日

An Effective Multivariate Normality Test via Hessians of Empirical Cumulant Generating Functions

Arxiv

0+阅读 · 2023年3月20日

On lower bounds for the bias-variance trade-off

Arxiv

0+阅读 · 2023年3月20日

A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity

Arxiv

0+阅读 · 2023年3月19日

Lower Generalization Bounds for GD and SGD in Smooth Stochastic Convex Optimization

Arxiv

0+阅读 · 2023年3月19日

Estimating optimal treatment regimes in survival contexts using an instrumental variable

Arxiv

0+阅读 · 2023年3月18日

The Principles of Deep Learning Theory

Arxiv

66+阅读 · 2021年6月18日

The Modern Mathematics of Deep Learning

Arxiv

49+阅读 · 2021年5月9日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

相关基金

GB-InSAR图像误差特征分析与改正模型研究

国家自然科学基金

0+阅读 · 2014年12月31日

具有沟槽-场限环复合终端双芯GCT的关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

斑块血流灌注时空异质性的高造影组织比超声造影成像

国家自然科学基金

0+阅读 · 2013年12月31日

1.94 um波段Tm:Ho共掺石英基全光纤飞秒脉冲激光技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

高铁钢轨表面缺陷的光声无损检测方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

碳纳米管负载的双金属纳米粒子复合材料的制备及应用

国家自然科学基金

0+阅读 · 2012年12月31日

载铂碳化钨蒙脱石复合材料微纳结构与电催化性能关联性研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

组合导航系统中基于混沌、小波和神经网络的信息融合方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员