The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks. A line of work has studied the NTK spectrum for two-layer and deep networks with at least a layer with $\Omega(N)$ neurons, $N$ being the number of training samples. Furthermore, there is increasing evidence suggesting that deep networks with sub-linear layer widths are powerful memorizers and optimizers, as long as the number of parameters exceeds the number of samples. Thus, a natural open question is whether the NTK is well conditioned in such a challenging sub-linear setup. In this paper, we answer this question in the affirmative. Our key technical contribution is a lower bound on the smallest NTK eigenvalue for deep networks with the minimum possible over-parameterization: the number of parameters is roughly $\Omega(N)$ and, hence, the number of neurons is as little as $\Omega(\sqrt{N})$. To showcase the applicability of our NTK bounds, we provide two results concerning memorization capacity and optimization guarantees for gradient descent training.
翻译:Neural Tangent Kernel(NTK)已成为在深神经网络中提供记忆化、优化和普及保障的有力工具,一线工作研究了NTK两层和深层网络的两层和深层网络频谱,至少有一层神经元,用美元作为培训样本的数量;此外,越来越多的证据表明,具有亚线层宽度的深层网络是强大的回忆仪和优化器,只要参数数量超过样本数量,那么,自然的未决问题是NTK是否在如此具有挑战性的子线性设置中处于良好条件。在本文件中,我们回答这个问题是肯定的。我们的主要技术贡献是,对于最小的NTK egen值的深层网络而言,其最小的NTK egen值与最低可能的超度分数:参数的数量大约是$Omega(N),因此,神经系数量与$\Omega(sqrt{NN)相比是微不足道的。展示了我们最优化的MTK极限能力的应用性。