The Neural Tangent Kernel (NTK) characterizes the behavior of infinitely-wide neural networks trained under least squares loss by gradient descent. Recent works also report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets. However, the computational complexity of kernel methods has limited its use in large-scale learning tasks. To accelerate learning with NTK, we design a near input-sparsity time approximation algorithm for NTK, by sketching the polynomial expansions of arc-cosine kernels: our sketch for the convolutional counterpart of NTK (CNTK) can transform any image using a linear runtime in the number of pixels. Furthermore, we prove a spectral approximation guarantee for the NTK matrix, by combining random features (based on leverage score sampling) of the arc-cosine kernels with a sketching algorithm. We benchmark our methods on various large-scale regression and classification tasks and show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
翻译:Neural Tangent Kernel (NTK) 具有在梯度下降最小平方损失下受训的无限范围神经网络行为的特点。 最近的工作还报告说,NTK回归可优于在小规模数据集方面受过训练的有限范围神经网络。 然而,内核方法的计算复杂性限制了其在大规模学习任务中的使用。为了加快与NTK的学习,我们设计了NTK的近于输入-分离时间近似算法,绘制了弧子内核多边扩张图:我们为NTK(CNTK)的相联对应方绘制的草图可以使用像素数的线性运行时间转换任何图像。此外,我们证明NTK矩阵的光谱近似保证,将弧子内核的随机特征(根据杠杆分数取样结果)与素描算法结合起来。我们用各种大规模回归和分类任务来测定我们的方法,并显示在CNTK(CNTK)特性上进行线性后退器训练的直径反射器,同时达到CNTK CRAS-10的精确速度。