The Neural Tangent Kernel (NTK) has recently attracted intense study, as it describes the evolution of an over-parameterized Neural Network (NN) trained by gradient descent. However, it is now well-known that gradient descent is not always a good optimizer for NNs, which can partially explain the unsatisfactory practical performance of the NTK regression estimator. In this paper, we introduce the Weighted Neural Tangent Kernel (WNTK), a generalized and improved tool, which can capture an over-parameterized NN's training dynamics under different optimizers. Theoretically, in the infinite-width limit, we prove: i) the stability of the WNTK at initialization and during training, and ii) the equivalence between the WNTK regression estimator and the corresponding NN estimator with different learning rates on different parameters. With the proposed weight update algorithm, both empirical and analytical WNTKs outperform the corresponding NTKs in numerical experiments.
翻译:最近,Neural Tangent Kernel(NTK)吸引了大量研究,因为它描述了以梯度下降方式训练的超参数神经网络(NN)的演进。然而,现在众所周知,对于NTS来说,梯度下降并不总是一个好的优化器,这可以部分地解释NTK回归测量仪的不尽如人意的实际性能。在本文中,我们引入了Weighted Neal Tangnent Kernel(WNTK),这是一个普遍和改良的工具,可以捕捉到NNT在不同优化器下的过度参数化培训动态。理论上,在无限的边缘限制下,我们证明:(1) WNTK在初始化和训练期间的稳定性,以及(2) WNTK回归测量仪和相应的NNTSimester之间的等值,不同参数的学习率不同。拟议的重量更新算法,经验和分析WNTKs在数字实验中都比相应的NTK。