The practice of deep learning has shown that neural networks generalize remarkably well even with an extreme number of learned parameters. This appears to contradict traditional statistical wisdom, in which a trade-off between model complexity and fit to the data is essential. We set out to resolve this discrepancy from a convex optimization and sparse recovery perspective. We consider the training and generalization properties of two-layer ReLU networks with standard weight decay regularization. Under certain regularity assumptions on the data, we show that ReLU networks with an arbitrary number of parameters learn only simple models that explain the data. This is analogous to the recovery of the sparsest linear model in compressed sensing. For ReLU networks and their variants with skip connections or normalization layers, we present isometry conditions that ensure the exact recovery of planted neurons. For randomly generated data, we show the existence of a phase transition in recovering planted neural network models. The situation is simple: whenever the ratio between the number of samples and the dimension exceeds a numerical threshold, the recovery succeeds with high probability; otherwise, it fails with high probability. Surprisingly, ReLU networks learn simple and sparse models even when the labels are noisy. The phase transition phenomenon is confirmed through numerical experiments.
翻译:深层学习实践表明,神经网络即使使用极其多的已知参数,也非常普遍。这似乎与传统的统计智慧相矛盾,传统的统计智慧是,模型复杂程度和数据相适应性之间的权衡至关重要。我们从曲线优化和零恢复的角度出发解决这一差异。我们考虑使用标准重量衰减正规化的双层ReLU网络的培训和一般化特性。根据数据的某些常规假设,我们显示,具有任意参数数的ReLU网络只学习解释数据的简单模型。这类似于恢复压缩感测中最稀少的线性模型。对于ReLU网络及其具有跳过连接或正常层的变异体,我们提出了确保准确恢复已安装神经元的偏差条件。对于随机生成的数据,我们展示了人工神经元网络模型的阶段过渡。情况很简单:当样品数量和尺寸之比超过数字阈值时,恢复概率很高;否则,恢复概率很高。令人惊讶的是,ReLU网络学习简单和隐蔽的模型,即使标签已经证实是紧凑的。