Empirical neural tangent kernels (eNTKs) can provide a good understanding of a given network's representation: they are often far less expensive to compute and applicable more broadly than infinite width NTKs. For networks with O output units (e.g. an O-class classifier), however, the eNTK on N inputs is of size $NO \times NO$, taking $O((NO)^2)$ memory and up to $O((NO)^3)$ computation. Most existing applications have therefore used one of a handful of approximations yielding $N \times N$ kernel matrices, saving orders of magnitude of computation, but with limited to no justification. We prove that one such approximation, which we call "sum of logits", converges to the true eNTK at initialization for any network with a wide final "readout" layer. Our experiments demonstrate the quality of this approximation for various uses across a range of settings.
翻译:经验性神经切核内核(ennTKs)可以很好地理解特定网络的表达方式:它们通常比宽度无限的NTK(nTKs)更便宜得多。 但是,对于O输出单位(例如O级分类器)的网络来说,N输入的eNTK是大小为$NO o times NO$((NO)2)$的内存,最高为$O((NO)3)$的计算。因此,大多数现有应用程序都使用了少量近似值之一,产生 $N\ times N$(N) 内核矩阵,保存了计算数量级,但没有正当理由。我们证明,我们称之为“日志总和”的这种近似点与任何网络初始化时的真正的ENTK相匹配,具有宽阔的“阅读”层。我们的实验表明,这种近似质量用于各种环境的各种用途。