灾难性的渔业灾难爆炸:早期阶段渔业矩阵影响一般化 (Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization) - 专知论文

会员服务 ·

0

局部曲率 · 泛化理论 · 迹 · 曲率 · 正则化项 ·

2021 年 5 月 31 日

Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization

翻译：灾难性的渔业灾难爆炸:早期阶段渔业矩阵影响一般化

Stanislaw Jastrzebski,Devansh Arpit,Oliver Astrand,Giancarlo Kerg,Huan Wang,Caiming Xiong,Richard Socher,Kyunghyun Cho,Krzysztof Geras

from arxiv, The last two authors contributed equally. Accepted to the International Conference on Machine Learning 2021

The early phase of training of deep neural networks has a dramatic effect on the local curvature of the loss function. For instance, using a small learning rate does not guarantee stable optimization because the optimization trajectory has a tendency to steer towards regions of the loss surface with increasing local curvature. We ask whether this tendency is connected to the widely observed phenomenon that the choice of the learning rate strongly influences generalization. We first show that stochastic gradient descent (SGD) implicitly penalizes the trace of the Fisher Information Matrix (FIM), a measure of the local curvature, from the beginning of training. We argue it is an implicit regularizer in SGD by showing that explicitly penalizing the trace of the FIM can significantly improve generalization. We highlight that poor final generalization coincides with the trace of the FIM increasing to a large value early in training, to which we refer as catastrophic Fisher explosion. Finally, to gain insight into the regularization effect of penalizing the trace of the FIM, we show that it limits memorization by reducing the learning speed of examples with noisy labels more than that of the clean examples.

翻译：深神经网络的早期培训对损失功能的当地曲线有巨大影响。例如,使用一个小学习率并不能保证稳定的优化,因为优化轨迹倾向于向损失表面区域倾斜,而地方曲线则日益曲线化。我们问,这一趋势是否与广泛观察到的现象相关,即选择学习率对一般化有强烈影响。我们首先表明,从培训开始,随机梯度梯度下降(SGD)就暗含地惩罚渔业信息矩阵(FIM)的痕迹(FIM)(FIM)(FIM)(FIM))(FIM)(FIM)(FIM)(FIM)(FIM)(FIM)(FIM)(FIM)(FIM)(FIM)(FIM)(FIM)(FI)(FIM)(FIM)(FI)(FIM)(FIM)(FIM)(FIM)(FIM)(FIM)(FIM(FIM)(FIM)(FIM)(FIM)(FIM)(FIM)(FIM)(FIM)(FIM)(FIM)(FIM)(FIM)(FIM)(FIM)(FIM)(FIM)(FIM)(T)(FIM)(FIM)(FIM)(FIM)(I(I(FIM)(FIM)(FIM)(FIM)(I)(I(I)(I)(I(I(I)(I)(I)(I)(I)(I)(I)(I)(I)(I)(I)(I)(I)(I)(I)(I))(I)(I)(I)(I)(I)(I))(I))(I))(I)(I)(I)(I)(I)(I(I(I)(I)(I))(I)(I(I(I)(I)(I)))))(I)((I)((I)(I)(I))(I)(I)(I)(T)(I(I)(I)(I)(I)(I)(I)(

0

相关内容

局部曲率

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

【CVPR2020-亚马逊】后向兼容表示学习，BackwardCompatible RepresentationLearning

【CVPR2020-亚马逊】后向兼容表示学习，BackwardCompatible RepresentationLearning

专知会员服务

13+阅读 · 2020年3月27日

【Facebook AI-ICLR2020】神经网络训练早期阶段探究，Early Phase of NN Training

【Facebook AI-ICLR2020】神经网络训练早期阶段探究，Early Phase of NN Training

专知会员服务

18+阅读 · 2020年3月3日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

强化学习最优表示的几何视角（A Geometric Perspective on Optimal Representations for Reinforcement Learning）

强化学习最优表示的几何视角（A Geometric Perspective on Optimal Representations for Reinforcement Learning）

专知会员服务

9+阅读 · 2019年12月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Testing exchangeability: fork-convexity, supermartingales, and e-processes

Testing exchangeability: fork-convexity, supermartingales, and e-processes

Arxiv

0+阅读 · 2021年7月23日

Empirical Risk Minimization in the Interpolating Regime with Application to Neural Network Learning

Arxiv

0+阅读 · 2021年7月23日

Geometric Lower Bounds for Distributed Parameter Estimation under Communication Constraints

Arxiv

0+阅读 · 2021年7月22日

Distribution of Classification Margins: Are All Data Equal?

Arxiv

0+阅读 · 2021年7月21日

On the Memorization Properties of Contrastive Learning

Arxiv

0+阅读 · 2021年7月21日

Information-Theoretic Generalization Bounds for Stochastic Gradient Descent

Arxiv

0+阅读 · 2021年7月20日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

I3DOL: Incremental 3D Object Learning without Catastrophic Forgetting

I3DOL: Incremental 3D Object Learning without Catastrophic Forgetting

Arxiv

6+阅读 · 2020年12月16日

Asynchronous Byzantine Machine Learning (the case of SGD)

Arxiv

3+阅读 · 2018年7月9日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

VIP会员

文章信息

相关主题

相关VIP内容

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

【CVPR2020-亚马逊】后向兼容表示学习，BackwardCompatible RepresentationLearning

【CVPR2020-亚马逊】后向兼容表示学习，BackwardCompatible RepresentationLearning

专知会员服务

13+阅读 · 2020年3月27日

【Facebook AI-ICLR2020】神经网络训练早期阶段探究，Early Phase of NN Training

【Facebook AI-ICLR2020】神经网络训练早期阶段探究，Early Phase of NN Training

专知会员服务

18+阅读 · 2020年3月3日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

强化学习最优表示的几何视角（A Geometric Perspective on Optimal Representations for Reinforcement Learning）

强化学习最优表示的几何视角（A Geometric Perspective on Optimal Representations for Reinforcement Learning）

专知会员服务

9+阅读 · 2019年12月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

【CMU博士论文】用于物理模拟的高效深度学习模型

《后勤保障》最新23页

《可持续创新之路：可组合系统构建军事技术新生态》

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Testing exchangeability: fork-convexity, supermartingales, and e-processes

Testing exchangeability: fork-convexity, supermartingales, and e-processes

Arxiv

0+阅读 · 2021年7月23日

Empirical Risk Minimization in the Interpolating Regime with Application to Neural Network Learning

Arxiv

0+阅读 · 2021年7月23日

Geometric Lower Bounds for Distributed Parameter Estimation under Communication Constraints

Arxiv

0+阅读 · 2021年7月22日

Distribution of Classification Margins: Are All Data Equal?

Arxiv

0+阅读 · 2021年7月21日

On the Memorization Properties of Contrastive Learning

Arxiv

0+阅读 · 2021年7月21日

Information-Theoretic Generalization Bounds for Stochastic Gradient Descent

Arxiv

0+阅读 · 2021年7月20日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

I3DOL: Incremental 3D Object Learning without Catastrophic Forgetting

I3DOL: Incremental 3D Object Learning without Catastrophic Forgetting

Arxiv

6+阅读 · 2020年12月16日

Asynchronous Byzantine Machine Learning (the case of SGD)

Arxiv

3+阅读 · 2018年7月9日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

微信扫码咨询专知VIP会员