亚当和训练战略如何帮助BNNs优化? (How Do Adam and Training Strategies Help BNNs Optimization?) - 专知论文

会员服务 ·

0

Adam · Weight · 最优化 · binary · Better ·

2021 年 6 月 21 日

How Do Adam and Training Strategies Help BNNs Optimization?

翻译：亚当和训练战略如何帮助BNNs优化?

Zechun Liu,Zhiqiang Shen,Shichao Li,Koen Helwegen,Dong Huang,Kwang-Ting Cheng

from arxiv, ICML 2021. Code and models are available at https://github.com/liuzechun/AdamBNN

The best performing Binary Neural Networks (BNNs) are usually attained using Adam optimization and its multi-step training variants. However, to the best of our knowledge, few studies explore the fundamental reasons why Adam is superior to other optimizers like SGD for BNN optimization or provide analytical explanations that support specific training strategies. To address this, in this paper we first investigate the trajectories of gradients and weights in BNNs during the training process. We show the regularization effect of second-order momentum in Adam is crucial to revitalize the weights that are dead due to the activation saturation in BNNs. We find that Adam, through its adaptive learning rate strategy, is better equipped to handle the rugged loss surface of BNNs and reaches a better optimum with higher generalization ability. Furthermore, we inspect the intriguing role of the real-valued weights in binary networks, and reveal the effect of weight decay on the stability and sluggishness of BNN optimization. Through extensive experiments and analysis, we derive a simple training scheme, building on existing Adam-based optimization, which achieves 70.5% top-1 accuracy on the ImageNet dataset using the same architecture as the state-of-the-art ReActNet while achieving 1.1% higher accuracy. Code and models are available at https://github.com/liuzechun/AdamBNN.

翻译：最出色的二进制神经网络(BNNs)通常是利用亚当优化及其多阶段培训变式来实现的。然而,据我们所知,很少有研究探讨亚当为什么优于SGD等其他优化者,如SGD为BNN优化而优于其他优化者,或提供有助于具体培训战略的分析解释。为此,我们首先在本文中调查BNNs在培训过程中的梯度和重量轨迹。我们展示了亚当二进制动力的正规化效应对于重振因启动BNS饱和而死重数至关重要。我们发现,亚当通过其适应性学习率战略,更有能力处理BNNS的粗略损失表面,并且以更高的一般化能力达到更好的最佳优化。此外,我们检查了二进制网络中实际价值重量的诱人作用,并揭示了体重衰减对BNNW优化的稳定性和疲软性的影响。我们通过广泛的实验和分析,在现有的亚当制优化基础上制定了一个简单的培训计划,它实现了70.5%的顶级学习率战略。在图像网络/Rab-lam/reqirecom上实现了70.5%的最高精确度,同时在图像网络/Rab-lab/reab/lab/lax

0

相关内容

Adam

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

55+阅读 · 2020年5月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【Amazon】使用预先训练的Transformer模型进行数据增强

【Amazon】使用预先训练的Transformer模型进行数据增强

专知会员服务

57+阅读 · 2020年3月6日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec智能推荐

6+阅读 · 2019年3月7日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

ASAT: Adaptively Scaled Adversarial Training in Time Series

Arxiv

0+阅读 · 2021年8月20日

Surrogate Assisted Strategies (The Parameterisation of an Infectious Disease Agent-Based Model)

Surrogate Assisted Strategies (The Parameterisation of an Infectious Disease Agent-Based Model)

Arxiv

0+阅读 · 2021年8月19日

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

Faster Meta Update Strategy for Noise-Robust Deep Learning

Arxiv

11+阅读 · 2021年4月30日

How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks

Arxiv

5+阅读 · 2021年2月21日

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

Arxiv

9+阅读 · 2021年2月8日

Does Data Augmentation Benefit from Split BatchNorms

Does Data Augmentation Benefit from Split BatchNorms

Arxiv

3+阅读 · 2020年10月15日

Self-training with Noisy Student improves ImageNet classification

Arxiv

15+阅读 · 2019年11月11日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

How to train your MAML

Arxiv

26+阅读 · 2019年3月5日

VIP会员

文章信息

相关主题

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

55+阅读 · 2020年5月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【Amazon】使用预先训练的Transformer模型进行数据增强

【Amazon】使用预先训练的Transformer模型进行数据增强

专知会员服务

57+阅读 · 2020年3月6日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec智能推荐

6+阅读 · 2019年3月7日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

ASAT: Adaptively Scaled Adversarial Training in Time Series

Arxiv

0+阅读 · 2021年8月20日

Surrogate Assisted Strategies (The Parameterisation of an Infectious Disease Agent-Based Model)

Surrogate Assisted Strategies (The Parameterisation of an Infectious Disease Agent-Based Model)

Arxiv

0+阅读 · 2021年8月19日

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

Faster Meta Update Strategy for Noise-Robust Deep Learning

Arxiv

11+阅读 · 2021年4月30日

How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks

Arxiv

5+阅读 · 2021年2月21日

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

Arxiv

9+阅读 · 2021年2月8日

Does Data Augmentation Benefit from Split BatchNorms

Does Data Augmentation Benefit from Split BatchNorms

Arxiv

3+阅读 · 2020年10月15日

Self-training with Noisy Student improves ImageNet classification

Arxiv

15+阅读 · 2019年11月11日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

How to train your MAML

Arxiv

26+阅读 · 2019年3月5日

微信扫码咨询专知VIP会员