The Neural Tangent Kernel (NTK), defined as $\Theta_\theta^f(x_1, x_2) = \left[\partial f(\theta, x_1)\big/\partial \theta\right] \left[\partial f(\theta, x_2)\big/\partial \theta\right]^T$ where $\left[\partial f(\theta, \cdot)\big/\partial \theta\right]$ is a neural network (NN) Jacobian, has emerged as a central object of study in deep learning. In the infinite width limit, the NTK can sometimes be computed analytically and is useful for understanding training and generalization of NN architectures. At finite widths, the NTK is also used to better initialize NNs, compare the conditioning across models, perform architecture search, and do meta-learning. Unfortunately, the finite width NTK is notoriously expensive to compute, which severely limits its practical utility. We perform the first in-depth analysis of the compute and memory requirements for NTK computation in finite width networks. Leveraging the structure of neural networks, we further propose two novel algorithms that change the exponent of the compute and memory requirements of the finite width NTK, dramatically improving efficiency. Our algorithms can be applied in a black box fashion to any differentiable function, including those implementing neural networks. We open-source our implementations within the Neural Tangents package (arXiv:1912.02803) at https://github.com/google/neural-tangents.
翻译:Neural Tangent Kernel (NTK), 定义为 $\ Theta ⁇ theta ⁇ f(x_1, x_2) = left [\ repart f(thata, x_1)\ big/\ parte\thetaright]\ left [\ repart f(theta, x_2)\ bigh/\ parte\ theta ⁇ right] t$(NTK), 定义为 一个神经网络(NNNN), 已经成为深层次学习网络中的一项核心研究目标。 在无限宽度限制中, NTK 有时可以进行分析, 用于理解NNT架构的培训和一般化。 在有限的宽度上, NTK 也用来更好地初始化 NNPs, 比较各种模型的调试, 进行建筑搜索, 并进行元化学习。 不幸的是, 最短的NTK 的宽度对于计算费用非常昂贵, 这严重限制了它的实用性功能。 我们第一次应用的是NT- nalgo 网络的深度分析, 的内径网络。