The practice of deep learning has shown that neural networks generalize remarkably well even with an extreme number of learned parameters. This appears to contradict traditional statistical wisdom, in which a trade-off between model complexity and fit to the data is essential. We aim to address this discrepancy by adopting a convex optimization and sparse recovery perspective. We consider the training and generalization properties of two-layer ReLU networks with standard weight decay regularization. Under certain regularity assumptions on the data, we show that ReLU networks with an arbitrary number of parameters learn only simple models that explain the data. This is analogous to the recovery of the sparsest linear model in compressed sensing. For ReLU networks and their variants with skip connections or normalization layers, we present isometry conditions that ensure the exact recovery of planted neurons. For randomly generated data, we show the existence of a phase transition in recovering planted neural network models, which is easy to describe: whenever the ratio between the number of samples and the dimension exceeds a numerical threshold, the recovery succeeds with high probability; otherwise, it fails with high probability. Surprisingly, ReLU networks learn simple and sparse models that generalize well even when the labels are noisy . The phase transition phenomenon is confirmed through numerical experiments.
翻译:深层学习实践表明,神经网络即使使用极其多的已知参数,也非常普遍。这似乎与传统的统计智慧相矛盾,传统的统计智慧认为,模型复杂程度和数据适应程度之间的权衡取舍至关重要。我们的目标是通过一个螺旋优化和稀少恢复角度来消除这一差异。我们考虑使用标准重量衰减规范化的双层ReLU网络的培训和概括性特性。根据数据的某些常规假设,我们显示,具有任意参数数量的ReLU网络只学习解释数据的简单模型。这类似于压缩遥感中最稀少线性模型的恢复。对于ReLU网络及其具有跳过连接或正常层的变异体,我们提出了确保人工神经元准确恢复的偏差条件。对于随机生成的数据,我们展示了人工神经网络模型的阶段过渡,这很容易描述:当样品数量和尺寸之比超过数字阈值时,恢复成功率很高;否则,恢复概率很高。令人惊讶的是,ReLU网络学习简单和隐蔽的模型,即使被确认为平坦的过渡阶段,也通过一般的标签。