We propose a spectral-based approach to analyze how two-layer neural networks separate from linear methods in terms of approximating high-dimensional functions. We show that quantifying this separation can be reduced to estimating the Kolmogorov width of two-layer neural networks, and the latter can be further characterized by using the spectrum of an associated kernel. Different from previous work, our approach allows obtaining upper bounds, lower bounds, and identifying explicit hard functions in a united manner. We provide a systematic study of how the choice of activation functions affects the separation, in particular the dependence on the input dimension. Specifically, for nonsmooth activation functions, we extend known results to more activation functions with sharper bounds. As concrete examples, we prove that any single neuron can instantiate the separation between neural networks and random feature models. For smooth activation functions, one surprising finding is that the separation is negligible unless the norms of inner-layer weights are polynomially large with respect to the input dimension. By contrast, the separation for nonsmooth activation functions is independent of the norms of inner-layer weights.
翻译:我们提出一个基于光谱的方法来分析两层神经网络如何在近似高维功能方面与线性方法分离。 我们显示,量化这一分离可以降低到估计两层神经网络的科尔莫戈洛夫宽度,而后者可以进一步使用相关内核的频谱。 不同于以往的工作, 我们的方法允许获得上界、 下界和以统一的方式识别明确的硬功能。 我们系统地研究激活功能的选择如何影响分离, 特别是对输入层面的依赖。 具体地说, 对于非移动激活功能, 我们将已知的结果推广到更清晰的激活功能。 作为具体例子, 我们证明任何单一的神经都可以瞬间分离神经网络和随机特性模型。 关于平稳的激活功能, 令人惊讶的是, 除非内层重量的规范与输入层面相比是多元的, 分离是微不足道的。 相比之下, 非移动激活功能的分离与内层重量的规范是独立的。