了解初始培训时神经网络的凝聚 (Towards Understanding the Condensation of Neural Networks at Initial Training) - 专知论文

会员服务 ·

0

可理解性 · Weight · 正则化项 · 激活函数 · Neural Networks ·

2021 年 11 月 17 日

Towards Understanding the Condensation of Neural Networks at Initial Training

翻译：了解初始培训时神经网络的凝聚

Zhi-Qin John Xu,Hanxu Zhou,Tao Luo,Yaoyu Zhang

Implicit regularization is important for understanding the learning of neural networks (NNs). Empirical works show that input weights of hidden neurons (the input weight of a hidden neuron consists of the weight from its input layer to the hidden neuron and its bias term) condense on isolated orientations with a small initialization. The condensation dynamics implies that the training implicitly regularizes a NN towards one with much smaller effective size. In this work, we utilize multilayer networks to show that the maximal number of condensed orientations in the initial training stage is twice the multiplicity of the activation function, where "multiplicity" is multiple roots of activation function at origin. Our theoretical analysis confirms experiments for two cases, one is for the activation function of multiplicity one with arbitrary dimension input, which contains many common activation functions, and the other is for the layer with one-dimensional input and arbitrary multiplicity. This work makes a step towards understanding how small initialization implicitly leads NNs to condensation at initial training stage, which lays a foundation for the future study of the nonlinear dynamics of NNs and its implicit regularization effect at a later stage of training.

翻译：隐含的正规化对于理解神经网络(NNs)的学习很重要。经验性的工作表明,隐性神经元(隐性神经元的输入重量包括从输入层到隐性神经元的重量及其偏差术语)的输入重量(隐含的神经元的输入重量)通过一个小的初始初始化方向的偏向偏向偏向偏向偏向偏向偏向偏向偏向偏向,这意味着培训隐含地将NNN变成一个小得多的有效规模。在这项工作中,我们利用多层网络来显示初始培训阶段压缩方向的最大数量是激活功能的两倍,“多重性”是源头激活功能的多重根。我们的理论分析证实了两个案例的实验,一个是具有任意尺寸输入的多重激活功能,其中含有许多共同的激活功能,另一个是具有一维向一层的。在初始培训阶段,这为理解小型初始化如何隐含地引导非线性动态的未来研究及其在后一个培训阶段的隐含正规化效果奠定了基础。

0

相关内容

可理解性

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【KDD2020】更深的图神经网络，Towards Deeper Graph Neural Networks

【KDD2020】更深的图神经网络，Towards Deeper Graph Neural Networks

专知会员服务

90+阅读 · 2020年7月22日

【ICML2020】持续图神经网络，Continuous Graph Neural Networks

【ICML2020】持续图神经网络，Continuous Graph Neural Networks

专知会员服务

151+阅读 · 2020年6月28日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【KDD2020】更深的图神经网络，Towards Deeper Graph Neural Networks

【KDD2020】更深的图神经网络，Towards Deeper Graph Neural Networks

专知

45+阅读 · 2020年7月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

Towards Building Economic Models of Conversational Search

Arxiv

0+阅读 · 2022年1月21日

Batch Normalization Preconditioning for Neural Network Training

Arxiv

0+阅读 · 2022年1月19日

Towards an Understanding of Default Policies in Multitask Policy Optimization

Arxiv

0+阅读 · 2022年1月19日

Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation

Arxiv

9+阅读 · 2021年6月16日

Graph Neural Networks: Architectures, Stability and Transferability

Arxiv

13+阅读 · 2020年8月4日

Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels

Arxiv

8+阅读 · 2019年11月4日

Understanding Attention and Generalization in Graph Neural Networks

Arxiv

4+阅读 · 2019年10月28日

Towards Understanding Regularization in Batch Normalization

Towards Understanding Regularization in Batch Normalization

Arxiv

4+阅读 · 2018年9月27日

Training behavior of deep neural network in frequency domain

Training behavior of deep neural network in frequency domain

Arxiv

4+阅读 · 2018年8月21日

Memory Networks

Arxiv

3+阅读 · 2015年11月29日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【KDD2020】更深的图神经网络，Towards Deeper Graph Neural Networks

【KDD2020】更深的图神经网络，Towards Deeper Graph Neural Networks

专知会员服务

90+阅读 · 2020年7月22日

【ICML2020】持续图神经网络，Continuous Graph Neural Networks

【ICML2020】持续图神经网络，Continuous Graph Neural Networks

专知会员服务

151+阅读 · 2020年6月28日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

【KDD2020】更深的图神经网络，Towards Deeper Graph Neural Networks

【KDD2020】更深的图神经网络，Towards Deeper Graph Neural Networks

专知

45+阅读 · 2020年7月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Towards Building Economic Models of Conversational Search

Arxiv

0+阅读 · 2022年1月21日

Batch Normalization Preconditioning for Neural Network Training

Arxiv

0+阅读 · 2022年1月19日

Towards an Understanding of Default Policies in Multitask Policy Optimization

Arxiv

0+阅读 · 2022年1月19日

Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation

Arxiv

9+阅读 · 2021年6月16日

Graph Neural Networks: Architectures, Stability and Transferability

Arxiv

13+阅读 · 2020年8月4日

Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels

Arxiv

8+阅读 · 2019年11月4日

Understanding Attention and Generalization in Graph Neural Networks

Arxiv

4+阅读 · 2019年10月28日

Towards Understanding Regularization in Batch Normalization

Towards Understanding Regularization in Batch Normalization

Arxiv

4+阅读 · 2018年9月27日

Training behavior of deep neural network in frequency domain

Training behavior of deep neural network in frequency domain

Arxiv

4+阅读 · 2018年8月21日

Memory Networks

Arxiv

3+阅读 · 2015年11月29日

微信扫码咨询专知VIP会员