Empirical evidence suggests that for a variety of overparameterized nonlinear models, most notably in neural network training, the growth of the loss around a minimizer strongly impacts its performance. Flat minima -- those around which the loss grows slowly -- appear to generalize well. This work takes a step towards understanding this phenomenon by focusing on the simplest class of overparameterized nonlinear models: those arising in low-rank matrix recovery. We analyze overparameterized matrix and bilinear sensing, robust PCA, covariance matrix estimation, and single hidden layer neural networks with quadratic activation functions. In all cases, we show that flat minima, measured by the trace of the Hessian, exactly recover the ground truth under standard statistical assumptions. For matrix completion, we establish weak recovery, although empirical evidence suggests exact recovery holds here as well. We conclude with synthetic experiments that illustrate our findings and discuss the effect of depth on flat solutions.
翻译:实证证据表明,对于各种多分明的非线性模型,特别是在神经网络培训中,损失在最小化器周围的增长对其性能产生了强烈的影响。 平面微型模型 -- -- 损失缓慢增长的小型模型 -- -- 似乎十分概括。 这项工作在理解这一现象方面迈出了一步,侧重于最简单的多分明的非线性模型:那些产生于低级矩阵恢复的模型。我们分析了超分化矩阵和双线感测、强大的五氯苯甲醚、共变矩阵估计以及带有二次激活功能的单层隐性层神经网络。 在所有情况下,我们以赫斯山的踪迹来衡量,在标准统计假设下完全恢复了地面真相。 对于矩阵的完成,我们建立了薄弱的恢复,尽管经验证据表明精确恢复也在这里存在。我们以合成实验来总结我们的调查结果,并讨论深度对平坦解决方案的影响。