Spectral analysis is a powerful tool, decomposing any function into simpler parts. In machine learning, Mercer's theorem generalizes this idea, providing for any kernel and input distribution a natural basis of functions of increasing frequency. More recently, several works have extended this analysis to deep neural networks through the framework of Neural Tangent Kernel. In this work, we analyze the layer-wise spectral bias of Deep Neural Networks and relate it to the contributions of different layers in the reduction of generalization error for a given target function. We utilize the properties of Hermite polynomials and spherical harmonics to prove that initial layers exhibit a larger bias towards high-frequency functions defined on the unit sphere. We further provide empirical results validating our theory in high dimensional datasets for Deep Neural Networks.
翻译:光谱分析是一个强大的工具, 将任何功能分解成更简单的部分。 在机器学习中, Mercer 的理论概括了这个想法, 规定任何内核和输入分布都是频率不断提高的功能的自然基础。 最近, 一些作品通过神经唐氏内核框架将这一分析扩展至深神经网络。 在这项工作中, 我们分析了深神经网络的分层光谱偏差, 并将其与不同层在减少特定目标函数的概括误差方面的贡献联系起来。 我们利用Hermite 多元球和球体口音的特性来证明初始层对单位域定义的高频函数表现出更大的偏向性。 我们还提供了在深海神经网络高维数据集中验证我们理论的经验结果 。