The Neural Tangent Kernel (NTK) has discovered connections between deep neural networks and kernel methods with insights of optimization and generalization. Motivated by this, recent works report that NTK can achieve better performances compared to training neural networks on small-scale datasets. However, results under large-scale settings are hardly studied due to the computational limitation of kernel methods. In this work, we propose an efficient feature map construction of the NTK of fully-connected ReLU network which enables us to apply it to large-scale datasets. We combine random features of the arc-cosine kernels with a sketching-based algorithm which can run in linear with respect to both the number of data points and input dimension. We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice. We additionally utilize the leverage score based sampling for improved bounds of arc-cosine random features and prove a spectral approximation guarantee of the proposed feature map to the NTK matrix of two-layer neural network. We benchmark a variety of machine learning tasks to demonstrate the superiority of the proposed scheme. In particular, our algorithm can run tens of magnitude faster than the exact kernel methods for large-scale settings without performance loss.
翻译:Neural Tangent Kernel (NTK) 发现深神经网络和内核方法之间有连接,有优化和概括的洞察力。最近的工作报告显示,NTK与小规模数据集培训神经网络相比,能够取得更好的性能;然而,由于内核方法的计算限制,大型设置下的结果几乎无法研究。在这项工作中,我们提议对NTK进行高效的特征图绘制,完全连通的RELU网络将它应用于大型数据集。我们把弧-钴内核的随机特征与基于素描的算法结合起来,这种算法可以对数据点数和输入层面进行线性运行。我们显示,由此产生的特征比其他基线地貌图构造的尺寸要小得多,以便在理论和实践上都达到类似的误差界限。我们还提议利用基于杠杆的取样方法改进了弧-RELU的随机特性的界限,并证明拟议的光谱图与NTK神经内核矩阵的随机近距离保证。我们为不比高的高级神经网络的运行模式制定不同的标准。我们测量了各种标准,可以用来测量高层次的模型。