角格:为进化神经网络的角趋同而采用的新优化技术 (AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks) - 专知论文

会员服务 ·

0

优化器 · INFORMS · Neural Networks · Adam · Extensibility ·

2021 年 5 月 21 日

AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks

翻译：角格:为进化神经网络的角趋同而采用的新优化技术

S. K. Roy,M. E. Paoletti,J. M. Haut,S. R. Dubey,P. Kar,A. Plaza,B. B. Chaudhuri

from arxiv, Submitted in IEEE

Convolutional neural networks (CNNs) are trained using stochastic gradient descent (SGD)-based optimizers. Recently, the adaptive moment estimation (Adam) optimizer has become very popular due to its adaptive momentum, which tackles the dying gradient problem of SGD. Nevertheless, existing optimizers are still unable to exploit the optimization curvature information efficiently. This paper proposes a new AngularGrad optimizer that considers the behavior of the direction/angle of consecutive gradients. This is the first attempt in the literature to exploit the gradient angular information apart from its magnitude. The proposed AngularGrad generates a score to control the step size based on the gradient angular information of previous iterations. Thus, the optimization steps become smoother as a more accurate step size of immediate past gradients is captured through the angular information. Two variants of AngularGrad are developed based on the use of Tangent or Cosine functions for computing the gradient angular information. Theoretically, AngularGrad exhibits the same regret bound as Adam for convergence purposes. Nevertheless, extensive experiments conducted on benchmark data sets against state-of-the-art methods reveal a superior performance of AngularGrad. The source code will be made publicly available at: https://github.com/mhaut/AngularGrad.

翻译：最近,适应性瞬时估计(Adam)优化由于适应性势头而变得非常流行,因为适应性瞬间估计(Adam)优化由于适应性动力,解决了SGD临终的梯度问题。然而,现有的优化者仍然无法有效利用优化曲线曲线信息。本文提议一个新的角格优化器,以考虑连续梯度方向/角的动作。这是文献中首次尝试利用梯度三角信息,但其大小除外。拟议的角格拉德根据先前迭代的梯度三角信息生成一个分以控制步数大小。因此,优化步骤随着刚过渐渐渐渐渐变的更精确步数而变得更为平滑。根据Tangent或Cosine函数计算梯度三角信息开发了两种变式。理论性,角格格拉德展示了与Adam相同的遗憾。然而,在基准数据组上对州/角信息组的梯度三角信息组进行了广泛的实验:Aglas/Agrad 的高级数据组/Agradrod 源显示的高级性。

0

相关内容

优化器

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

【剑桥大学ICLR2020】卷积条件神经过程，Convolutional Conditional Neural Processes

【剑桥大学ICLR2020】卷积条件神经过程，Convolutional Conditional Neural Processes

专知会员服务

33+阅读 · 2020年1月19日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【NeurlPS2019论文总结】一致收敛可能无法解释深度学习中的泛化现象，Uniform convergence may be unable to explain generalization in deep learning

【NeurlPS2019论文总结】一致收敛可能无法解释深度学习中的泛化现象，Uniform convergence may be unable to explain generalization in deep learning

专知会员服务

15+阅读 · 2019年12月17日

深度学习界圣经“花书”《Deep Learning》中文版来了

深度学习界圣经“花书”《Deep Learning》中文版来了

专知会员服务

239+阅读 · 2019年10月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

已删除

将门创投

3+阅读 · 2020年8月3日

深度学习最全优化方法总结比较（SGD，Adagrad，Adadelta，Adam，Adamax，Nadam）

深度学习最全优化方法总结比较（SGD，Adagrad，Adadelta，Adam，Adamax，Nadam）

极市平台

21+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Better SGD using Second-order Momentum

Arxiv

0+阅读 · 2021年7月12日

Joint Matrix Decomposition for Deep Convolutional Neural Networks Compression

Joint Matrix Decomposition for Deep Convolutional Neural Networks Compression

Arxiv

0+阅读 · 2021年7月9日

Activated Gradients for Deep Neural Networks

Arxiv

0+阅读 · 2021年7月9日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Attributed Graph Clustering via Adaptive Graph Convolution

Arxiv

11+阅读 · 2019年6月4日

Selective Kernel Networks

Arxiv

3+阅读 · 2019年3月15日

Dynamic Weight Alignment for Convolutional Neural Networks

Arxiv

6+阅读 · 2018年1月25日

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

Arxiv

6+阅读 · 2018年1月23日

Adaptive Graph Convolutional Neural Networks

Arxiv

7+阅读 · 2018年1月10日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

【剑桥大学ICLR2020】卷积条件神经过程，Convolutional Conditional Neural Processes

【剑桥大学ICLR2020】卷积条件神经过程，Convolutional Conditional Neural Processes

专知会员服务

33+阅读 · 2020年1月19日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【NeurlPS2019论文总结】一致收敛可能无法解释深度学习中的泛化现象，Uniform convergence may be unable to explain generalization in deep learning

【NeurlPS2019论文总结】一致收敛可能无法解释深度学习中的泛化现象，Uniform convergence may be unable to explain generalization in deep learning

专知会员服务

15+阅读 · 2019年12月17日

深度学习界圣经“花书”《Deep Learning》中文版来了

深度学习界圣经“花书”《Deep Learning》中文版来了

专知会员服务

239+阅读 · 2019年10月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

已删除

将门创投

3+阅读 · 2020年8月3日

深度学习最全优化方法总结比较（SGD，Adagrad，Adadelta，Adam，Adamax，Nadam）

深度学习最全优化方法总结比较（SGD，Adagrad，Adadelta，Adam，Adamax，Nadam）

极市平台

21+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Better SGD using Second-order Momentum

Arxiv

0+阅读 · 2021年7月12日

Joint Matrix Decomposition for Deep Convolutional Neural Networks Compression

Joint Matrix Decomposition for Deep Convolutional Neural Networks Compression

Arxiv

0+阅读 · 2021年7月9日

Activated Gradients for Deep Neural Networks

Arxiv

0+阅读 · 2021年7月9日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Attributed Graph Clustering via Adaptive Graph Convolution

Arxiv

11+阅读 · 2019年6月4日

Selective Kernel Networks

Arxiv

3+阅读 · 2019年3月15日

Dynamic Weight Alignment for Convolutional Neural Networks

Arxiv

6+阅读 · 2018年1月25日

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

Arxiv

6+阅读 · 2018年1月23日

Adaptive Graph Convolutional Neural Networks

Arxiv

7+阅读 · 2018年1月10日

微信扫码咨询专知VIP会员