Data with low-dimensional nonlinear structure are ubiquitous in engineering and scientific problems. We study a model problem with such structure -- a binary classification task that uses a deep fully-connected neural network to classify data drawn from two disjoint smooth curves on the unit sphere. Aside from mild regularity conditions, we place no restrictions on the configuration of the curves. We prove that when (i) the network depth is large relative to certain geometric properties that set the difficulty of the problem and (ii) the network width and number of samples is polynomial in the depth, randomly-initialized gradient descent quickly learns to correctly classify all points on the two curves with high probability. To our knowledge, this is the first generalization guarantee for deep networks with nonlinear data that depends only on intrinsic data properties. Our analysis proceeds by a reduction to dynamics in the neural tangent kernel (NTK) regime, where the network depth plays the role of a fitting resource in solving the classification problem. In particular, via fine-grained control of the decay properties of the NTK, we demonstrate that when the network is sufficiently deep, the NTK can be locally approximated by a translationally invariant operator on the manifolds and stably inverted over smooth functions, which guarantees convergence and generalization.
翻译:具有低维非线性结构的数据在工程和科学问题中普遍存在。我们研究了这种结构的模型问题 -- -- 一个二进制分类任务,它使用一个完全连接的深度神经网络,对单元球上两个互不相连的平滑曲线中的数据进行分类。除了轻微的规律性条件外,我们对曲线的配置没有限制。我们证明,当(一) 网络深度相对于某些决定问题难度的几何特性而言很大,以及(二) 网络宽度和样本数量在深度上是多元的,随机的初始梯度下降迅速学会对两个曲线上的所有点进行正确分类,而且概率很高。据我们所知,这是对带有非线性数据但仅取决于内在数据属性的深度网络的第一个普遍保障。我们的分析通过降低神经红心内核(NTK)系统的动态,网络深度发挥适当资源解决分类问题的作用。尤其是通过精细控制NTK的腐烂性梯度下降,我们证明,当网络的平稳的趋同性功能由稳定的操作者在地面上进行平稳的转化。