Recent results on optimization and generalization properties of neural networks showed that in a simple two-layer network, the alignment of the labels to the eigenvectors of the corresponding Gram matrix determines the convergence of the optimization during training. Such analyses also provide upper bounds on the generalization error. We experimentally investigate the implications of these results to deeper networks via embeddings. We regard the layers preceding the final hidden layer as producing different representations of the input data which are then fed to the two-layer model. We show that these representations improve both optimization and generalization. In particular, we investigate three kernel representations when fed to the final hidden layer: the Gaussian kernel and its approximation by random Fourier features, kernels designed to imitate representations produced by neural networks and finally an optimal kernel designed to align the data with target labels. The approximated representations induced by these kernels are fed to the neural network and the optimization and generalization properties of the final model are evaluated and compared.
翻译:关于神经网络优化和一般化特性的近期结果显示,在一个简单的两层网络中,标签与相应的Gram矩阵的树皮原体的匹配决定了培训过程中优化的趋同。这种分析还提供了一般化错误的上界。我们实验性地调查这些结果对更深的网络的影响。我们认为最后隐藏层前的层产生了输入数据的不同表达方式,这些数据随后被输入到两层模型中。我们表明这些表示方式既改进了优化,也改进了一般化。特别是,我们调查了三种内核表示方式,即高斯内核及其通过随机的Fourier特性的近似、设计用来模仿神经网络产生的表示方式并最终设计一个旨在将数据与目标标签相匹配的最佳内核。这些内核的近端表示方式被输入到神经网络中,最后模型的优化和概括特性被评估和比较。