The Neural Tangent Kernel (NTK) is the wide-network limit of a kernel defined using neural networks at initialization, whose embedding is the gradient of the output of the network with respect to its parameters. We study the "after kernel", which is defined using the same embedding, except after training, for neural networks with standard architectures, on binary classification problems extracted from MNIST and CIFAR-10, trained using SGD in a standard way. Lyu and Li described a sense in which neural networks, under certain conditions, are equivalent to SVM with the after kernel. Our experiments are consistent with this proposition under natural conditions. For networks with an architecure similar to VGG, the after kernel is more "global", in the sense that it is less invariant to transformations of input images that disrupt the global structure of the image while leaving the local statistics largely intact. For fully connected networks, the after kernel is less global in this sense. The after kernel tends to be more invariant to small shifts, rotations and zooms; data augmentation does not improve these invariances. The (finite approximation to the) conjugate kernel, obtained using the last layer of hidden nodes, sometimes, but not always, provides a good approximation to the NTK and the after kernel.
翻译:Neoral Tangent Kernel (NTK) 是使用神经网络初始化时使用神经网络定义的内核的宽网络界限, 嵌入是网络输出相对于参数的梯度。 我们研究“ 后内核”, 其定义使用相同的嵌入, 除了培训之外, 用于具有标准结构的神经网络, 从MNIST和CIFAR- 10 中提取的二进制分类问题, 使用SGD 进行标准培训。 Lyu 和 Li 描述了在一定条件下, 神经网络在某些条件下, 等同于 SVM 和 后内核。 我们的实验在自然条件下符合这个主张。 对于与VGGG相似的弧形网络, 后内核更是“ 后内核”, 其定义使用相同的嵌入内核, 也就是说, 输入图像的变异性较少, 扰乱了全球图像结构, 而使本地统计数据基本保持完整。 对于完全连接的网络来说, 后内核在这种意义上来说, 后内核的内核往往更不稳定, 倾向于小的变换,, 和缩。 数据递增不至 。