在有正纵形初始化的深网络的神经相向核心内核上 (On the Neural Tangent Kernel of Deep Networks with Orthogonal Initialization)

In recent years, a critical initialization scheme of orthogonal initialization on deep nonlinear networks has been proposed. The orthogonal weights are crucial to achieve {\it dynamical isometry} for random networks, where the entire spectrum of singular values of an input-output Jacobian are around one. The strong empirical evidence that orthogonal initialization in linear networks and the linear regime of nonlinear networks can speed up training than Gaussian initialization raise great interests. One recent work has proven the benefit of orthogonal initialization in linear networks. However, the dynamics behind it have not been revealed on nonlinear networks. In this work, we study the Neural Tangent Kernel (NTK), which can describe dynamics of gradient descent training of wide network, and focus on fully-connected and nonlinear networks with orthogonal initialization. We prove that NTK of Gaussian and orthogonal weights are equal when the network width is infinite, resulting in a conclusion that orthogonal initialization can speed up training is a finite-width effect in the small learning rate regime. Then we find that during training, the NTK of infinite-width network with orthogonal initialization stays constant theoretically and varies at a rate of the same order as Gaussian ones empirically, as the width tends to infinity. Finally, we conduct a thorough empirical investigation of training speed on CIFAR10 datasets and show the benefit of orthogonal initialization lies in the large learning rate and depth phase in a linear regime of nonlinear network.

翻译：近些年来, 提出了一个对深非线性网络进行正向初始化的关键初始化方案。对于随机网络来说, 正向加权对于实现 prit 动态等量计至关重要。对于随机网络来说, 随机网络来说, 输入- 输出 Jacobian 的整个单值范围环绕其中之一。强有力的经验证据表明, 线性网络和非线性网络的正向初始化可以加快培训速度, 而不是高萨初始化, 引起了极大的兴趣。最近的一项工作证明, 直线网络的直向初始化具有彻底深度初始化的好处。然而, 非线性网络并没有显示它背后的动态。在这项工作中, 我们研究 Neural Tangent Kernel (NTK), 它可以描述宽广网络的梯度下降训练的动态, 重点是与直线性网络完全连接和非线性初始化的初始化网络。我们证明, 当网络的宽度是无限的, 或直线性初始化初始化的初始化的初始化, 和直径直线性网络的精度系统化, 我们发现, 直径直线性网络的初始化和直径性网络的精度水平的精度的精度的精度的精度的精度和直度的精度。