高山噪音喷射中的非对称重尾和隐含比亚 (Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections) - 专知论文

会员服务 ·

0

噪声 · 有偏 · SGD · Neural Networks · Networks ·

2021 年 6 月 10 日

Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections

翻译：高山噪音喷射中的非对称重尾和隐含比亚

Alexander Camuto,Xiaoyu Wang,Lingjiong Zhu,Chris Holmes,Mert Gürbüzbalaban,Umut Şimşekli

from arxiv, Main paper of 12 pages, followed by appendix

Gaussian noise injections (GNIs) are a family of simple and widely-used regularisation methods for training neural networks, where one injects additive or multiplicative Gaussian noise to the network activations at every iteration of the optimisation algorithm, which is typically chosen as stochastic gradient descent (SGD). In this paper we focus on the so-called `implicit effect' of GNIs, which is the effect of the injected noise on the dynamics of SGD. We show that this effect induces an asymmetric heavy-tailed noise on SGD gradient updates. In order to model this modified dynamics, we first develop a Langevin-like stochastic differential equation that is driven by a general family of asymmetric heavy-tailed noise. Using this model we then formally prove that GNIs induce an `implicit bias', which varies depending on the heaviness of the tails and the level of asymmetry. Our empirical results confirm that different types of neural networks trained with GNIs are well-modelled by the proposed dynamics and that the implicit effect of these injections induces a bias that degrades the performance of networks.

翻译：Gausian 噪音注射(GIS)是一个简单和广泛使用的神经网络培训常规化方法组成的大家庭,其中一种为网络注入添加或倍增的Gaussian噪音,在优化算法的每一次迭代中激活网络,通常选择优化算法为随机梯度下沉(SGD)。在本文中,我们侧重于GNI所谓的“隐含效应”,这是注入噪音对SGD动态的影响。我们表明,这种效应在SGD梯度更新中诱发了不对称的重尾巴重尾巴噪音。为了模拟这一变异动态,我们首先开发了一个由非对称重尾巴噪音大家庭驱动的Langevin相似的随机偏差方程式。然后,我们用这一模型正式证明,GNI引发了“隐性偏差”,这取决于尾巴的高度和不对称程度。我们的经验结果证实,在SGDGE更新时,经过培训的不同种类的神经网络都受到拟议动态的完善的模拟,而且这些注入的隐含效果导致一种偏差,使网络的网络的性下降。

0

相关内容

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

专知会员服务

65+阅读 · 2020年12月11日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【CMU-Spring2020课程】离散微分几何15讲，Discrete Differential Geometry

【CMU-Spring2020课程】离散微分几何15讲，Discrete Differential Geometry

专知会员服务

55+阅读 · 2020年3月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【ICLR2020】面向层次重要性属性:神经序列模型的组成语义解释（Towards Hierarchical Importance Attribution:explaining compositional semantics for Neural Sequence Models）

【ICLR2020】面向层次重要性属性:神经序列模型的组成语义解释（Towards Hierarchical Importance Attribution:explaining compositional semantics for Neural Sequence Models）

专知会员服务

10+阅读 · 2019年12月24日

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

专知会员服务

46+阅读 · 2019年12月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

读论文Discriminative Deep Metric Learning for Face and KV

读论文Discriminative Deep Metric Learning for Face and KV

统计学习与视觉计算组

12+阅读 · 2018年4月6日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Signaling Games in Higher Dimensions: Geometric Properties of Equilibrium Solutions

Arxiv

0+阅读 · 2021年8月11日

Learning Deep Multimodal Feature Representation with Asymmetric Multi-layer Fusion

Arxiv

1+阅读 · 2021年8月11日

Arbitrage-Free Implied Volatility Surface Generation with Variational Autoencoders

Arxiv

0+阅读 · 2021年8月10日

Bayesian Inference using the Proximal Mapping: Uncertainty Quantification under Varying Dimensionality

Arxiv

0+阅读 · 2021年8月10日

Canonical Noise Distributions and Private Hypothesis Tests

Arxiv

0+阅读 · 2021年8月9日

Uncertainty quantification for industrial design using dictionaries of reduced order models

Arxiv

0+阅读 · 2021年8月9日

Interpolation can hurt robust generalization even when there is no noise

Arxiv

0+阅读 · 2021年8月5日

Directional Bias Amplification

Arxiv

3+阅读 · 2021年2月24日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Improving Bi-directional Generation between Different Modalities with Variational Autoencoders

Arxiv

5+阅读 · 2018年1月26日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

专知会员服务

65+阅读 · 2020年12月11日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【CMU-Spring2020课程】离散微分几何15讲，Discrete Differential Geometry

【CMU-Spring2020课程】离散微分几何15讲，Discrete Differential Geometry

专知会员服务

55+阅读 · 2020年3月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【ICLR2020】面向层次重要性属性:神经序列模型的组成语义解释（Towards Hierarchical Importance Attribution:explaining compositional semantics for Neural Sequence Models）

【ICLR2020】面向层次重要性属性:神经序列模型的组成语义解释（Towards Hierarchical Importance Attribution:explaining compositional semantics for Neural Sequence Models）

专知会员服务

10+阅读 · 2019年12月24日

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

专知会员服务

46+阅读 · 2019年12月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

读论文Discriminative Deep Metric Learning for Face and KV

读论文Discriminative Deep Metric Learning for Face and KV

统计学习与视觉计算组

12+阅读 · 2018年4月6日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Signaling Games in Higher Dimensions: Geometric Properties of Equilibrium Solutions

Arxiv

0+阅读 · 2021年8月11日

Learning Deep Multimodal Feature Representation with Asymmetric Multi-layer Fusion

Arxiv

1+阅读 · 2021年8月11日

Arbitrage-Free Implied Volatility Surface Generation with Variational Autoencoders

Arxiv

0+阅读 · 2021年8月10日

Bayesian Inference using the Proximal Mapping: Uncertainty Quantification under Varying Dimensionality

Arxiv

0+阅读 · 2021年8月10日

Canonical Noise Distributions and Private Hypothesis Tests

Arxiv

0+阅读 · 2021年8月9日

Uncertainty quantification for industrial design using dictionaries of reduced order models

Arxiv

0+阅读 · 2021年8月9日

Interpolation can hurt robust generalization even when there is no noise

Arxiv

0+阅读 · 2021年8月5日

Directional Bias Amplification

Arxiv

3+阅读 · 2021年2月24日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Improving Bi-directional Generation between Different Modalities with Variational Autoencoders

Arxiv

5+阅读 · 2018年1月26日

微信扫码咨询专知VIP会员