The study of Neural Tangent Kernels (NTKs) has provided much needed insight into convergence and generalization properties of neural networks in the over-parametrized (wide) limit by approximating the network using a first-order Taylor expansion with respect to its weights in the neighborhood of their initialization values. This allows neural network training to be analyzed from the perspective of reproducing kernel Hilbert spaces (RKHS), which is informative in the over-parametrized regime, but a poor approximation for narrower networks as the weights change more during training. Our goal is to extend beyond the limits of NTK toward a more general theory. We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights as an inner product of two feature maps, respectively from data and weight-step space, to feature space, allowing neural network training to be analyzed from the perspective of reproducing kernel {\em Banach} space (RKBS). We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning in RKBS. Using this, we present novel bound on uniform convergence where the iterations count and learning rate play a central role, giving new theoretical insight into neural network training.
翻译:神经唐氏内核(NTKs)的研究通过使用泰勒在初始值附近重量的一阶扩展,对超平衡(全)限制的神经网络的趋同和一般特性提供了非常需要的洞察力。 这使神经网络培训能够从复制内核Hilbert空间(RKHS)的角度进行分析,后者在过度平衡制度中是信息丰富的,但随着重量的变化在培训过程中,对更窄的网络的近似性却非常差。 我们的目标是将神经网络的超平衡(全)限制的范围扩大到更普遍的理论。我们把神经网络在初始重量的有限周边的神经网络的精确电力序列代表作为两个地貌地图的内产物,分别来自数据和权重空间,使神经网络培训能够从再生产内核内核(RKBS)空间(RKBS)的角度加以分析。 我们证明,不论宽度的宽度如何变化,由梯度下降产生的训练序列可以完全复制,通过常规的连续学习,在核心空间中进行新的学习。我们用这种新式的理论化,将它复制为核心的学习。