AdaL:适应性梯度转变有助于趋同和普遍化 (AdaL: Adaptive Gradient Transformation Contributes to Convergences and Generalizations) - 专知论文

会员服务 ·

0

泛化理论 · 变换 · 优化器 · 平滑 · 噪声 ·

2021 年 7 月 4 日

AdaL: Adaptive Gradient Transformation Contributes to Convergences and Generalizations

翻译：AdaL:适应性梯度转变有助于趋同和普遍化

Hongwei Zhang,Weidong Zou,Hongbo Zhao,Qi Ming,Tijin Yan,Yuanqing Xia,Weipeng Cao

Adaptive optimization methods have been widely used in deep learning. They scale the learning rates adaptively according to the past gradient, which has been shown to be effective to accelerate the convergence. However, they suffer from poor generalization performance compared with SGD. Recent studies point that smoothing exponential gradient noise leads to generalization degeneration phenomenon. Inspired by this, we propose AdaL, with a transformation on the original gradient. AdaL accelerates the convergence by amplifying the gradient in the early stage, as well as dampens the oscillation and stabilizes the optimization by shrinking the gradient later. Such modification alleviates the smoothness of gradient noise, which produces better generalization performance. We have theoretically proved the convergence of AdaL and demonstrated its effectiveness on several benchmarks.

翻译：在深层学习中广泛采用了适应性优化方法,根据过去的梯度对学习率进行适应性调整,这已证明对加速趋同十分有效;然而,与SGD相比,学习率普遍化表现不佳;最近的研究显示,平滑指数性梯度噪音会导致普遍化变异现象;因此,我们提议AdaL在原梯度上进行转换;AdaL在早期阶段扩大梯度,加快了趋同速度,并通过以后缩小梯度来抑制振荡和稳定优化速度;这种改变减轻了梯度噪音的平滑性,从而提高了一般化效果;我们理论上证明了AdaL的趋同速度,并在几个基准上表明了其有效性。

0

相关内容

泛化理论

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

On the Convergence of Decentralized Adaptive Gradient Methods

Arxiv

0+阅读 · 2021年9月7日

Fishr: Invariant Gradient Variances for Out-of-distribution Generalization

Fishr: Invariant Gradient Variances for Out-of-distribution Generalization

Arxiv

1+阅读 · 2021年9月7日

Towards Memory-Efficient Neural Networks via Multi-Level in situ Generation

Arxiv

0+阅读 · 2021年9月5日

Imposing Relation Structure in Language-Model Embeddings Using Contrastive Learning

Arxiv

0+阅读 · 2021年9月4日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

Relative Positional Encoding for Transformers with Linear Complexity

Arxiv

8+阅读 · 2021年5月18日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

Compositional Generalization in Image Captioning

Compositional Generalization in Image Captioning

Arxiv

3+阅读 · 2019年9月16日

Learning to Importance Sample in Primary Sample Space

Learning to Importance Sample in Primary Sample Space

Arxiv

5+阅读 · 2018年8月23日

Feasibility Based Large Margin Nearest Neighbor Metric Learning

Arxiv

3+阅读 · 2018年5月2日

VIP会员

文章信息

相关主题

相关VIP内容

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《无人机集群配置对模拟作战环境任务效能的影响研究》最新50页

《俄罗斯作战模式解析：对俄特别军事行动的观察报告》最新325页

军用无人机集群技术尚未成熟——但潜力可期

《无人机改变战争规则，但无法破解陆战固有挑战》最新报告

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

相关论文

On the Convergence of Decentralized Adaptive Gradient Methods

Arxiv

0+阅读 · 2021年9月7日

Fishr: Invariant Gradient Variances for Out-of-distribution Generalization

Fishr: Invariant Gradient Variances for Out-of-distribution Generalization

Arxiv

1+阅读 · 2021年9月7日

Towards Memory-Efficient Neural Networks via Multi-Level in situ Generation

Arxiv

0+阅读 · 2021年9月5日

Imposing Relation Structure in Language-Model Embeddings Using Contrastive Learning

Arxiv

0+阅读 · 2021年9月4日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

Relative Positional Encoding for Transformers with Linear Complexity

Arxiv

8+阅读 · 2021年5月18日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

Compositional Generalization in Image Captioning

Compositional Generalization in Image Captioning

Arxiv

3+阅读 · 2019年9月16日

Learning to Importance Sample in Primary Sample Space

Learning to Importance Sample in Primary Sample Space

Arxiv

5+阅读 · 2018年8月23日

Feasibility Based Large Margin Nearest Neighbor Metric Learning

Arxiv

3+阅读 · 2018年5月2日

微信扫码咨询专知VIP会员