稳定负负衰减常规化 (Stable Weight Decay Regularization) - 专知论文

会员服务 ·

0

权重衰减 · Weight · 正则化项 · Adam · 动量 ·

2021 年 2 月 9 日

Stable Weight Decay Regularization

翻译：稳定负负衰减常规化

Zeke Xie,Issei Sato,Masashi Sugiyama

from arxiv, 20 pages, 18 figures, Weight Decay

Weight decay is a popular regularization technique for training of deep neural networks. Modern deep learning libraries mainly use $L_{2}$ regularization as the default implementation of weight decay. \citet{loshchilov2018decoupled} demonstrated that $L_{2}$ regularization is not identical to weight decay for adaptive gradient methods, such as Adaptive Momentum Estimation (Adam), and proposed Adam with Decoupled Weight Decay (AdamW). However, we found that the popular implementations of weight decay, including $L_{2}$ regularization and decoupled weight decay, in modern deep learning libraries usually damage performance. First, the $L_{2}$ regularization is unstable weight decay for all optimizers that use Momentum, such as stochastic gradient descent (SGD). Second, decoupled weight decay is highly unstable for all adaptive gradient methods. We further propose the Stable Weight Decay (SWD) method to fix the unstable weight decay problem from a dynamical perspective. The proposed SWD method makes significant improvements over $L_{2}$ regularization and decoupled weight decay in our experiments. Simply fixing weight decay in Adam by SWD, with no extra hyperparameter, can usually outperform complex Adam variants, which have more hyperparameters.

翻译：重力衰减是培养深神经网络的一种流行的正规化技术。现代深层学习图书馆主要使用以美元为单位的正规化,以默认实施重量衰减。\ citet{loshchilov2018decouple} 显示,美元正规化与适应性梯度方法(如适应性动动动动动动动动动动动动动动动动动动动动动动动动)和拟议采用分解式重力衰减(AdamW)的亚当(Adam)相比,重力衰减(包括美元)普遍实施重衰减(包括美元正规化和分解重量衰减)通常会损害性能。首先,美元正规化是所有使用运动性梯度梯度梯度梯度的优化剂(SGD),重量衰减(Adamdam)并不完全相同。我们进一步从动态角度提出了稳定不稳定性重衰减(SWD)问题的方法。拟议的SWD方法使美元(SWD)的重量衰减幅度大幅提高,通常由S&Q(Squm)的变压(Squm)的变压(Squm)变压(Sy)和变压(Saddock)的变压(Shard)的变压(Sadd),这些变压(S&d)的变压(Squmd)的变压(S&)法。

0

相关内容

权重衰减

「数据数学:从理论到计算」EPFL硬核课程

「数据数学:从理论到计算」EPFL硬核课程

专知会员服务

44+阅读 · 2021年1月31日

UIUC《深度学习理论》硬核课程书，Matus 教授最新讲解，131页pdf

UIUC《深度学习理论》硬核课程书，Matus 教授最新讲解，131页pdf

专知会员服务

64+阅读 · 2021年1月8日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

33+阅读 · 2020年8月14日

《常微分方程》笔记，419页pdf

《常微分方程》笔记，419页pdf

专知会员服务

74+阅读 · 2020年8月2日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【微众银行】联邦学习白皮书_v2.0，48页pdf，

【微众银行】联邦学习白皮书_v2.0，48页pdf，

专知会员服务

168+阅读 · 2020年4月26日

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

专知会员服务

147+阅读 · 2020年4月11日

【NeurlPS2019论文总结】一致收敛可能无法解释深度学习中的泛化现象，Uniform convergence may be unable to explain generalization in deep learning

【NeurlPS2019论文总结】一致收敛可能无法解释深度学习中的泛化现象，Uniform convergence may be unable to explain generalization in deep learning

专知会员服务

15+阅读 · 2019年12月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新5篇网络节点表示（Network Embedding）相关论文—高阶网络、矩阵分解、多视角、虚拟网络、云计算

【论文推荐】最新5篇网络节点表示（Network Embedding）相关论文—高阶网络、矩阵分解、多视角、虚拟网络、云计算

专知

7+阅读 · 2018年2月9日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】神经网络调试经验汇编：神经网络不好使该咋办？

【推荐】神经网络调试经验汇编：神经网络不好使该咋办？

机器学习研究会

5+阅读 · 2017年9月5日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Estimating means of bounded random variables by betting

Arxiv

0+阅读 · 2021年4月1日

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

Arxiv

0+阅读 · 2021年3月31日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

Hyperparameter Ensembles for Robustness and Uncertainty Quantification

Arxiv

12+阅读 · 2020年6月24日

Prime Sample Attention in Object Detection

Arxiv

13+阅读 · 2019年4月9日

Single-frame Regularization for Temporally Stable CNNs

Single-frame Regularization for Temporally Stable CNNs

Arxiv

3+阅读 · 2019年2月27日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Adversarial Meta-Learning

Arxiv

7+阅读 · 2018年6月8日

Stable Distribution Alignment Using the Dual of the Adversarial Distance

Arxiv

3+阅读 · 2018年1月30日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

相关VIP内容

「数据数学:从理论到计算」EPFL硬核课程

「数据数学:从理论到计算」EPFL硬核课程

专知会员服务

44+阅读 · 2021年1月31日

UIUC《深度学习理论》硬核课程书，Matus 教授最新讲解，131页pdf

UIUC《深度学习理论》硬核课程书，Matus 教授最新讲解，131页pdf

专知会员服务

64+阅读 · 2021年1月8日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

33+阅读 · 2020年8月14日

《常微分方程》笔记，419页pdf

《常微分方程》笔记，419页pdf

专知会员服务

74+阅读 · 2020年8月2日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【微众银行】联邦学习白皮书_v2.0，48页pdf，

【微众银行】联邦学习白皮书_v2.0，48页pdf，

专知会员服务

168+阅读 · 2020年4月26日

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

专知会员服务

147+阅读 · 2020年4月11日

【NeurlPS2019论文总结】一致收敛可能无法解释深度学习中的泛化现象，Uniform convergence may be unable to explain generalization in deep learning

【NeurlPS2019论文总结】一致收敛可能无法解释深度学习中的泛化现象，Uniform convergence may be unable to explain generalization in deep learning

专知会员服务

15+阅读 · 2019年12月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新5篇网络节点表示（Network Embedding）相关论文—高阶网络、矩阵分解、多视角、虚拟网络、云计算

【论文推荐】最新5篇网络节点表示（Network Embedding）相关论文—高阶网络、矩阵分解、多视角、虚拟网络、云计算

专知

7+阅读 · 2018年2月9日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】神经网络调试经验汇编：神经网络不好使该咋办？

【推荐】神经网络调试经验汇编：神经网络不好使该咋办？

机器学习研究会

5+阅读 · 2017年9月5日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Estimating means of bounded random variables by betting

Arxiv

0+阅读 · 2021年4月1日

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

Arxiv

0+阅读 · 2021年3月31日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

Hyperparameter Ensembles for Robustness and Uncertainty Quantification

Arxiv

12+阅读 · 2020年6月24日

Prime Sample Attention in Object Detection

Arxiv

13+阅读 · 2019年4月9日

Single-frame Regularization for Temporally Stable CNNs

Single-frame Regularization for Temporally Stable CNNs

Arxiv

3+阅读 · 2019年2月27日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Adversarial Meta-Learning

Arxiv

7+阅读 · 2018年6月8日

Stable Distribution Alignment Using the Dual of the Adversarial Distance

Arxiv

3+阅读 · 2018年1月30日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员