在有正纵形初始化的深网络的神经相向核心内核上 (On the Neural Tangent Kernel of Deep Networks with Orthogonal Initialization) - 专知论文

会员服务 ·

0

正交初始化 · 正交 · Networking · Networks · 线性的 ·

2020 年 4 月 13 日

On the Neural Tangent Kernel of Deep Networks with Orthogonal Initialization

翻译：在有正纵形初始化的深网络的神经相向核心内核上

Wei Huang,Weitao Du,Richard Yi Da Xu

from arxiv, 16 pages, 5 figures

In recent years, a critical initialization scheme with orthogonal initialization deep nonlinear networks has been proposed. The orthogonal weights are crucial to achieve dynamical isometry for random networks, where entire spectrum of singular values of an input-output Jacobian are around one. The strong empirical evidence that orthogonal initialization in linear networks and the linear regime of non-linear networks can speed up training than Gaussian networks raise great interests. One recent work has proven the benefit of orthogonal initialization in linear networks. However, the dynamics behind it have not been revealed on non-linear networks. In this work, we study the Neural Tangent Kernel (NTK), which describes the gradient descent training of wide networks, on orthogonal, wide, fully-connect, and nonlinear networks. We prove that NTK of Gaussian and Orthogonal weights are equal when the network width is infinite, resulting in a conclusion that orthogonal initialization can speed up training is a finite-width effect in the small learning rate region. Then we find that during training, the NTK of infinite-width networks with orthogonal initialization stay constant theoretically and vary at a rate of the same order empirically as Gaussian ones, as the width tends to infinity. Finally, we conduct a thorough empirical investigation of training speed on CIFAR10 datasets and show the benefit of orthogonal initialization lies in the large learning rate and depth region in a linear regime of nonlinear networks.

翻译：近些年来, 提出了一个具有正向初始化深非线性网络的临界初始化计划。正在提出一个具有正向初始化作用的精密初始化计划。但是, 在非线性网络中, 矩形加权对于实现随机网络动态等量测量至关重要, 随机网络中, 输入输出 Jacobian 的整个单值范围环绕一个网络。有力的实证证据表明, 线性网络和非线性网络的直线性初始化和线性网络线性体系可以比高斯网络更快地提高培训速度。最近的一项工作证明, 在线性网络中, 垂直初始化初始化初始化的好处是非线性网络的精度效应。在此工作中, 我们研究Neural Tangent Kernel (NTK), 描述宽度网络的梯度下降性下降率培训, 宽度、宽度、完全连接和非线性网络。我们证明, 当网络的宽度和直径直线性网络在初始化程度区域, 和直径直线性网络在初始化过程中, 显示直线性水平级学习速度性网络的稳定性, 直线性水平, 和直线性水平在不断的不断演进进度上, 。

0

相关内容

正交初始化

正交初始化

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

【论文推荐】 Bidirectional Self-Normalizing Neural Networks：双向自归一化神经网络

【论文推荐】 Bidirectional Self-Normalizing Neural Networks：双向自归一化神经网络

专知会员服务

17+阅读 · 2020年6月22日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【清华大学】图随机神经网络，Graph Random Neural Networks

【清华大学】图随机神经网络，Graph Random Neural Networks

专知会员服务

156+阅读 · 2020年5月26日

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

专知会员服务

44+阅读 · 2020年3月26日

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

专知会员服务

34+阅读 · 2020年2月27日

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

专知会员服务

14+阅读 · 2020年1月1日

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

专知会员服务

16+阅读 · 2019年10月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新5篇网络节点表示（Network Embedding）相关论文—高阶网络、矩阵分解、多视角、虚拟网络、云计算

【论文推荐】最新5篇网络节点表示（Network Embedding）相关论文—高阶网络、矩阵分解、多视角、虚拟网络、云计算

专知

7+阅读 · 2018年2月9日

NIPS 2017：贝叶斯深度学习与深度贝叶斯学习（讲义+视频）

NIPS 2017：贝叶斯深度学习与深度贝叶斯学习（讲义+视频）

机器学习研究会

36+阅读 · 2017年12月10日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【论文】【论文】王晓刚老师课题组ICCV2017论文：学习特征金字塔用于人体姿态估计（附代码）

【论文】【论文】王晓刚老师课题组ICCV2017论文：学习特征金字塔用于人体姿态估计（附代码）

机器学习研究会

6+阅读 · 2017年8月5日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels

Arxiv

8+阅读 · 2019年11月4日

A Comparison of Neural Network Training Methods for Text Classification

Arxiv

6+阅读 · 2019年10月28日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

Few Shot Learning with Simplex

Few Shot Learning with Simplex

Arxiv

5+阅读 · 2018年7月27日

Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks

Arxiv

3+阅读 · 2018年6月6日

SpectralNet: Spectral Clustering using Deep Neural Networks

Arxiv

11+阅读 · 2018年1月10日

VIP会员

文章信息

相关主题

正交初始化

相关VIP内容

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

【论文推荐】 Bidirectional Self-Normalizing Neural Networks：双向自归一化神经网络

【论文推荐】 Bidirectional Self-Normalizing Neural Networks：双向自归一化神经网络

专知会员服务

17+阅读 · 2020年6月22日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【清华大学】图随机神经网络，Graph Random Neural Networks

【清华大学】图随机神经网络，Graph Random Neural Networks

专知会员服务

156+阅读 · 2020年5月26日

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

专知会员服务

44+阅读 · 2020年3月26日

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

专知会员服务

34+阅读 · 2020年2月27日

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

专知会员服务

14+阅读 · 2020年1月1日

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

专知会员服务

16+阅读 · 2019年10月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】面向企业的图学习扩展：生产级图学习与推理，485页pdf

AI智能体编程：技术、挑战与机遇综述

【国家标准】数据安全技术数据安全风险评估方法

【CMU博士论文】交互式学习的进展：替代性反馈机制与自适应因果推理

相关资讯

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新5篇网络节点表示（Network Embedding）相关论文—高阶网络、矩阵分解、多视角、虚拟网络、云计算

【论文推荐】最新5篇网络节点表示（Network Embedding）相关论文—高阶网络、矩阵分解、多视角、虚拟网络、云计算

专知

7+阅读 · 2018年2月9日

NIPS 2017：贝叶斯深度学习与深度贝叶斯学习（讲义+视频）

NIPS 2017：贝叶斯深度学习与深度贝叶斯学习（讲义+视频）

机器学习研究会

36+阅读 · 2017年12月10日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【论文】【论文】王晓刚老师课题组ICCV2017论文：学习特征金字塔用于人体姿态估计（附代码）

【论文】【论文】王晓刚老师课题组ICCV2017论文：学习特征金字塔用于人体姿态估计（附代码）

机器学习研究会

6+阅读 · 2017年8月5日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels

Arxiv

8+阅读 · 2019年11月4日

A Comparison of Neural Network Training Methods for Text Classification

Arxiv

6+阅读 · 2019年10月28日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

Few Shot Learning with Simplex

Few Shot Learning with Simplex

Arxiv

5+阅读 · 2018年7月27日

Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks

Arxiv

3+阅读 · 2018年6月6日

SpectralNet: Spectral Clustering using Deep Neural Networks

Arxiv

11+阅读 · 2018年1月10日

微信扫码咨询专知VIP会员