高山噪音喷射中的非对称重尾和隐含比亚 (Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections) - 专知论文

会员服务 ·

0

噪声 · 有偏 · SGD · Neural Networks · Networks ·

2021 年 2 月 13 日

Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections

翻译：高山噪音喷射中的非对称重尾和隐含比亚

Alexander Camuto,Xiaoyu Wang,Lingjiong Zhu,Chris Holmes,Mert Gürbüzbalaban,Umut Şimşekli

from arxiv, Main paper of 12 pages, followed by appendix

Gaussian noise injections (GNIs) are a family of simple and widely-used regularisation methods for training neural networks, where one injects additive or multiplicative Gaussian noise to the network activations at every iteration of the optimisation algorithm, which is typically chosen as stochastic gradient descent (SGD). In this paper we focus on the so-called `implicit effect' of GNIs, which is the effect of the injected noise on the dynamics of SGD. We show that this effect induces an asymmetric heavy-tailed noise on SGD gradient updates. In order to model this modified dynamics, we first develop a Langevin-like stochastic differential equation that is driven by a general family of asymmetric heavy-tailed noise. Using this model we then formally prove that GNIs induce an `implicit bias', which varies depending on the heaviness of the tails and the level of asymmetry. Our empirical results confirm that different types of neural networks trained with GNIs are well-modelled by the proposed dynamics and that the implicit effect of these injections induces a bias that degrades the performance of networks.

翻译：Gausian 噪音注射(GIS)是一个简单和广泛使用的神经网络培训常规化方法组成的大家庭,其中一种为网络注入添加或倍增的Gaussian噪音,在优化算法的每一次迭代中激活网络,通常选择优化算法为随机梯度下沉(SGD)。在本文中,我们侧重于GNI所谓的“隐含效应”,这是注入噪音对SGD动态的影响。我们表明,这种效应在SGD梯度更新中诱发了不对称的重尾巴重尾巴噪音。为了模拟这一变异动态,我们首先开发了一个由非对称重尾巴噪音大家庭驱动的Langevin相似的随机偏差方程式。然后,我们用这一模型正式证明,GNI引发了“隐性偏差”,这取决于尾巴的高度和不对称程度。我们的经验结果证实,在SGDGE更新时,经过培训的不同种类的神经网络都受到拟议动态的完善的模拟,而且这些注入的隐含效果导致一种偏差,使网络的网络的性下降。

0

相关内容

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【斯坦福大学】深度学习技巧速查清单《CS 230 - Deep Learning Tips and Tricks Cheatsheet》

【斯坦福大学】深度学习技巧速查清单《CS 230 - Deep Learning Tips and Tricks Cheatsheet》

专知会员服务

29+阅读 · 2019年12月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

基于 Carsim 2016 和 Simulink的无人车运动控制联合仿真（四）

基于 Carsim 2016 和 Simulink的无人车运动控制联合仿真（四）

泡泡机器人SLAM

14+阅读 · 2019年4月30日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Optimal CPU Scheduling in Data Centers via a Finite-Time Distributed Quantized Coordination Mechanism

Optimal CPU Scheduling in Data Centers via a Finite-Time Distributed Quantized Coordination Mechanism

Arxiv

0+阅读 · 2021年4月7日

Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit Bias towards Low Rank

Arxiv

0+阅读 · 2021年4月7日

Out-of-Distribution Robustness with Deep Recursive Filters

Arxiv

0+阅读 · 2021年4月6日

Sampling Matrices from Harish-Chandra-Itzykson-Zuber Densities with Applications to Quantum Inference and Differential Privacy

Arxiv

0+阅读 · 2021年4月6日

A Latent space solver for PDE generalization

Arxiv

0+阅读 · 2021年4月6日

Asymmetric Loss For Multi-Label Classification

Arxiv

6+阅读 · 2020年9月29日

Jointly Optimizing Diversity and Relevance in Neural Response Generation

Arxiv

4+阅读 · 2019年2月28日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly

Arxiv

18+阅读 · 2018年1月15日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【斯坦福大学】深度学习技巧速查清单《CS 230 - Deep Learning Tips and Tricks Cheatsheet》

【斯坦福大学】深度学习技巧速查清单《CS 230 - Deep Learning Tips and Tricks Cheatsheet》

专知会员服务

29+阅读 · 2019年12月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《多智能体不确定环境追逃博弈研究》216页

美智库最新发布《解放军"人机编组协同作战"发展路径：理论与实践》53页

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

《俄军无人机创新技术或已在乌克兰达成"战场空中封锁"作战效果》最新18页报告

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

基于 Carsim 2016 和 Simulink的无人车运动控制联合仿真（四）

基于 Carsim 2016 和 Simulink的无人车运动控制联合仿真（四）

泡泡机器人SLAM

14+阅读 · 2019年4月30日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Optimal CPU Scheduling in Data Centers via a Finite-Time Distributed Quantized Coordination Mechanism

Optimal CPU Scheduling in Data Centers via a Finite-Time Distributed Quantized Coordination Mechanism

Arxiv

0+阅读 · 2021年4月7日

Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit Bias towards Low Rank

Arxiv

0+阅读 · 2021年4月7日

Out-of-Distribution Robustness with Deep Recursive Filters

Arxiv

0+阅读 · 2021年4月6日

Sampling Matrices from Harish-Chandra-Itzykson-Zuber Densities with Applications to Quantum Inference and Differential Privacy

Arxiv

0+阅读 · 2021年4月6日

A Latent space solver for PDE generalization

Arxiv

0+阅读 · 2021年4月6日

Asymmetric Loss For Multi-Label Classification

Arxiv

6+阅读 · 2020年9月29日

Jointly Optimizing Diversity and Relevance in Neural Response Generation

Arxiv

4+阅读 · 2019年2月28日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly

Arxiv

18+阅读 · 2018年1月15日

微信扫码咨询专知VIP会员