In deep learning it is common to overparameterize the neural networks, that is, to use more parameters than training samples. Quite surprisingly training the neural network via (stochastic) gradient descent leads to models that generalize very well, while classical statistics would suggest overfitting. In order to gain understanding of this implicit bias phenomenon we study the special case of sparse recovery (compressive sensing) which is of interest on its own. More precisely, in order to reconstruct a vector from underdetermined linear measurements, we introduce a corresponding overparameterized square loss functional, where the vector to be reconstructed is deeply factorized into several vectors. We show that, under a very mild assumption on the measurement matrix, vanilla gradient flow for the overparameterized loss functional converges to a solution of minimal $\ell_1$-norm. The latter is well-known to promote sparse solutions. As a by-product, our results significantly improve the sample complexity for compressive sensing in previous works. The theory accurately predicts the recovery rate in numerical experiments. For the proofs, we introduce the concept of {\textit{solution entropy}}, which bypasses the obstacles caused by non-convexity and should be of independent interest.
翻译:在深层学习中,过分衡量神经网络是常见的,即使用比培训样本更多的参数。令人惊讶的是,通过(沙沙地)梯度下降对神经网络进行培训,导致模型非常普遍化,而古典统计则表明,这种模型过于完善。为了了解这种隐含的偏差现象,我们研究的是自身感兴趣的微弱恢复(压缩感知)的特例。更确切地说,为了从不确定的线性测量中重建一个矢量,我们引入了一个相应的多度偏差的平方损失功能,其中要重建的矢量被深入地吸收到若干矢量中。我们表明,在测量矩阵上一个非常温和的假设下,过分度损失的香草梯度流动功能会集中到一个最小的 $@ ell_ 1$- 诺尔姆的解决方案。后一种众所周知的解决方案是零散的。作为一个副产品,我们的结果极大地提高了以往工作中压缩感测的样本复杂性。理论准确地预测了数字实验中的回收率。为了证明,我们引入了以不独立的方式绕过息障碍和不独立的方式绕过利率的概念。