球形动态动态:正常化、体重衰减和SGD的神经网络学习动态 (Spherical Motion Dynamics: Learning Dynamics of Neural Network with Normalization, Weight Decay, and SGD) - 专知论文

会员服务 ·

0

Weight · 权重衰减 · Neural Networks · SGD · Networking ·

2020 年 11 月 27 日

Spherical Motion Dynamics: Learning Dynamics of Neural Network with Normalization, Weight Decay, and SGD

翻译：球形动态动态:正常化、体重衰减和SGD的神经网络学习动态

Ruosi Wan,Zhanxing Zhu,Xiangyu Zhang,Jian Sun

from arxiv, Theoretical analysis on joint effect of normalization and weight decay

In this work, we comprehensively reveal the learning dynamics of neural network with normalization, weight decay (WD), and SGD (with momentum), named as Spherical Motion Dynamics (SMD). Most related works study SMD by focusing on "effective learning rate" in "equilibrium" condition, where weight norm remains unchanged. However, their discussions on why equilibrium condition can be reached in SMD is either absent or less convincing. Our work investigates SMD by directly exploring the cause of equilibrium condition. Specifically, 1) we introduce the assumptions that can lead to equilibrium condition in SMD, and prove that weight norm can converge at linear rate with given assumptions; 2) we propose "angular update" as a substitute for effective learning rate to measure the evolving of neural network in SMD, and prove angular update can also converge to its theoretical value at linear rate; 3) we verify our assumptions and theoretical results on various computer vision tasks including ImageNet and MSCOCO with standard settings. Experiment results show our theoretical findings agree well with empirical observations.

翻译：在这项工作中,我们全面揭示了神经网络的学习动态,这些神经网络具有正常化、重量衰减(WD)和SGD(动力),称为球状运动动力(SMD),大多数相关作品都通过侧重于“平衡”条件下的“有效学习率”来研究SMD,其重量标准保持不变。然而,关于为什么在SMD中达到平衡条件的讨论要么不存在,要么不那么令人信服。我们的工作通过直接探索平衡状况的原因对SMD进行调查。具体地说,1我们引入了可能导致SMD中平衡状况的假设,并证明重量标准可以以线性速度与特定假设趋同;2我们建议用“角更新”替代有效学习率来衡量SMD中神经网络的演变,并证明角性更新也可以以线性速度与其理论价值趋同;3我们核查了我们关于各种计算机视觉任务的假设和理论结果,包括图像网和MCCO与标准环境。实验结果表明我们的理论结果与经验观察非常一致。

0

相关内容

Weight

【NeurIPS 2020】学习神经网络中的不变性

专知会员服务

29+阅读 · 2020年10月24日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【斯坦福大学】Gradient Surgery for Multi-Task Learning

【斯坦福大学】Gradient Surgery for Multi-Task Learning

专知会员服务

47+阅读 · 2020年1月23日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Network Diffusions via Neural Mean-Field Dynamics

Arxiv

0+阅读 · 2021年1月19日

Monomial-agnostic computation of vanishing ideals

Arxiv

0+阅读 · 2021年1月19日

Motion Generation Using Bilateral Control-Based Imitation Learning with Autoregressive Learning

Arxiv

0+阅读 · 2021年1月19日

Stable Recovery of Entangled Weights: Towards Robust Identification of Deep Neural Networks from Minimal Samples

Arxiv

0+阅读 · 2021年1月18日

Deep learning of contagion dynamics on complex networks

Arxiv

0+阅读 · 2021年1月15日

Dynamic Normalization

Arxiv

0+阅读 · 2021年1月15日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

Asynchronous Byzantine Machine Learning (the case of SGD)

Arxiv

3+阅读 · 2018年7月9日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

【NeurIPS 2020】学习神经网络中的不变性

专知会员服务

29+阅读 · 2020年10月24日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【斯坦福大学】Gradient Surgery for Multi-Task Learning

【斯坦福大学】Gradient Surgery for Multi-Task Learning

专知会员服务

47+阅读 · 2020年1月23日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能绝不能完全自主》

《人工智能的法律与伦理：军事自主机器独特挑战的深度剖析》316页

从数据到主导：AI与兵棋推演构筑决策优势

《特洛伊木马货柜：武器化集装箱的战略威胁》最新报告

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Network Diffusions via Neural Mean-Field Dynamics

Arxiv

0+阅读 · 2021年1月19日

Monomial-agnostic computation of vanishing ideals

Arxiv

0+阅读 · 2021年1月19日

Motion Generation Using Bilateral Control-Based Imitation Learning with Autoregressive Learning

Arxiv

0+阅读 · 2021年1月19日

Stable Recovery of Entangled Weights: Towards Robust Identification of Deep Neural Networks from Minimal Samples

Arxiv

0+阅读 · 2021年1月18日

Deep learning of contagion dynamics on complex networks

Arxiv

0+阅读 · 2021年1月15日

Dynamic Normalization

Arxiv

0+阅读 · 2021年1月15日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

Asynchronous Byzantine Machine Learning (the case of SGD)

Arxiv

3+阅读 · 2018年7月9日

微信扫码咨询专知VIP会员