The recent success of neural network models has shone light on a rather surprising statistical phenomenon: statistical models that perfectly fit noisy data can generalize well to unseen test data. Understanding this phenomenon of $\textit{benign overfitting}$ has attracted intense theoretical and empirical study. In this paper, we consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk when the covariates satisfy sub-Gaussianity and anti-concentration properties, and the noise is independent and sub-Gaussian. By leveraging recent results that characterize the implicit bias of this estimator, our bounds emphasize the role of both the quality of the initialization as well as the properties of the data covariance matrix in achieving low excess risk.
翻译:最近神经网络模型的成功暴露了一种令人惊讶的统计现象:完全适合吵闹数据的统计模型可以将大量数据概括为隐蔽的测试数据。理解美元(textit{benign overformatit})现象已经吸引了大量的理论和经验研究。 在本文中,我们考虑将受过平方损失梯度流训练的两层线性神经网络内插起来,并在共变体满足亚加盟性和反集中特性、噪音是独立的和亚加盟的时,从过度风险中推开界限。 通过利用这个估计者隐含的偏差的最近结果,我们的界限强调初始化质量以及数据共变矩阵对于实现低超风险的作用。