Neural networks in the lazy training regime converge to kernel machines. Can neural networks in the rich feature learning regime learn a kernel machine with a data-dependent kernel? We demonstrate that this can indeed happen due to a phenomenon we term silent alignment, which requires that the tangent kernel of a network evolves in eigenstructure while small and before the loss appreciably decreases, and grows only in overall scale afterwards. We show that such an effect takes place in homogenous neural networks with small initialization and whitened data. We provide an analytical treatment of this effect in the linear network case. In general, we find that the kernel develops a low-rank contribution in the early phase of training, and then evolves in overall scale, yielding a function equivalent to a kernel regression solution with the final network's tangent kernel. The early spectral learning of the kernel depends on the depth. We also demonstrate that non-whitened data can weaken the silent alignment effect.
翻译:懒惰训练制度中的神经网络与内核相融合。 丰富特性学习制度中的神经网络能否用数据依赖内核来学习内核机? 我们证明,这之所以能够发生,是因为我们使用静静的对齐,这要求一个网络的相近内核在机体结构中演化,而小的和在损失明显减少之前,并且只是在整体规模上发展。 我们显示,这种效应发生在具有小初始化和白化数据的同质神经网络中。 我们在线性网络案例中对这种效应进行了分析处理。 一般来说,我们发现内核在训练的早期阶段发展了低级贡献,然后在总体规模上演化,产生相当于最后网络的正热内核内核内核的内核回归溶液的功能。 内核早期光学取决于深度。 我们还表明,非白色数据可以削弱静态对接合效应。