Expressiveness and generalization of deep models was recently addressed via the connection between neural networks (NNs) and kernel learning, where first-order dynamics of NN during a gradient-descent (GD) optimization were related to gradient similarity kernel, also known as Neural Tangent Kernel (NTK). In the majority of works this kernel is considered to be time-invariant, with its properties being defined entirely by NN architecture and independent of the learning task at hand. In contrast, in this paper we empirically explore these properties along the optimization and show that in practical applications the NTK changes in a very dramatic and meaningful way, with its top eigenfunctions aligning toward the target function learned by NN. Moreover, these top eigenfunctions serve as basis functions for NN output - a function represented by NN is spanned almost completely by them for the entire optimization process. Further, since the learning along top eigenfunctions is typically fast, their alignment with the target function improves the overall optimization performance. In addition, we study how the neural spectrum is affected by learning rate decay, typically done by practitioners, showing various trends in the kernel behavior. We argue that the presented phenomena may lead to a more complete theoretical understanding behind NN learning.
翻译:最近,通过神经网络(NN)和内核学习(NNS)之间的联系,对深层模型的表达性和概括性进行了处理。 神经网络(NN)和内核学习(NNS)之间的联系最近解决了深层模型的表达性和概括性。 神经网络(GD)优化期间NN的一阶动态与梯度相似的梯度内核(又称NT)有关。在大多数工作中,这个内核被认为是时间变化性的,其特性完全由NNN结构界定,独立于手头的学习任务。与此形成对照的是,我们在本文件中,在优化的同时实验性地探索这些特性,并表明在实际应用中,NTK变化以非常戏剧和有意义的方式发生,其顶部机能功能与NNNN的目标函数一致。此外,这些顶部元元元功能作为NNNE输出的基础功能——NNE代表的功能几乎完全被NNS结构所覆盖,整个优化过程。此外,由于与顶部机能功能的学习过程通常很快,它们与目标功能的配合会改善业绩。此外,我们研究神经频谱如何受到学习速度下降趋势的影响,典型的演化趋势。